On Mar 16, 2012, at 19:57 , Erik Corry wrote: > 2012/3/17 Norbert Lindenberg <[email protected]>: >> Thanks for your comments - a few replies below. >> >> Norbert >> >> >> On Mar 16, 2012, at 1:55 , Erik Corry wrote: >> >>> However I think we probably do want the /u modifier on regexps to >>> control the new backward-incompatible behaviour. There may be some >>> way to relax this for regexp literals in opted in Harmony code, but >>> for new RegExp(...) and for other string literals I think there are >>> rather too many inconsistencies with the old behaviour. >> >> Before asking developers to add /u, we should really have some evidence that >> not doing so would cause actual compatibility issues for real applications. >> Do you know of any examples? > > No. In general I don't think it is realistic to try to prove that > problematic code does not exist, since that requires quantifying over > all existing JS code, which is clearly impossible.
We cannot prove its absence, but we can discuss the likelihood of its existence, and showing an actual example is a quick way to bring that discussion to a conclusion. I note that you didn't challenge my claim about the (un)likelihood of the existence of applications that depend on Deseret characters not being mapped to lower case while calling toLowerCase... >>> The algorithm given for codePointAt never returns NaN. It should >>> probably do that for indices that hit a trail surrogate that has a >>> lead surrogate preceeding it. >> >> NaN is not a valid code point, so it shouldn't be returned. If we want to >> indicate access to a trailing surrogate code unit as an error, we should >> throw an exception. > > Then you should probably remove the text: "If there is no code unit at > that position, the result is NaN" from your proposal :-) > > I am wary of using exceptions for non-exceptional data-driven events, > since performance is usually terrible and it's arguably an abuse of > the mechanism. Your iterator code looks fine to me an needs neither > NaN or exceptions. The iterator or codePointAt? The latter has the statement you quote, which shows a disconnect between what I wrote a few days ago starting from the charCodeAt spec, and what I think when I don't look at that spec. charCodeAt (and hence the current implementation of codePointAt) returns NaN when given an index < 0 or ≥ length. The normal behavior when accessing elements or properties that don't exist is to return undefined. We can't fix charCodeAt anymore, but I can still fix codePointAt. >>> Perhaps it is outside the scope of this proposal, but it would also >>> make a lot of sense to add some named character classes to RegExp. >> >> It would make a lot of sense, but is outside the scope of this proposal. One >> step at a time :-) > > I can see that. But if we are going to have multiple versions of the > RegExp syntax we should probably aim to keep the number down. True. And in the meantime Brendan pointed to some regex proposals that try to address a different set of Unicode-related issues, also with a /u flag. Some coordination is clearly needed. http://blog.stevenlevithan.com/archives/fixing-javascript-regexp _______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

