For a reference, here's how Java (tried w/ Oracle 1.8.0_31 and openjdk 1.7.0_65) Pattern.UNICODE_CHARACTER_CLASS works:
foo\uD834bar and foo\uDC00bar match ^foo[^a]bar$ and ^foo.bar$, so, generally, lonely surrogates match /./. Backreferences are allowed to consume the leading surrogate of a valid surrogate pair: Ex1: foo\uD834bar\uD834\uDC00 matches foo(.+)bar\1 But surprisingly: Ex2: \uDC00foobar\uD834\uDC00foobar\uD834 doesn't match ^(.+)\1$ ... So Ex2 works as if the input string was converted to UTF-32 before matching, but Ex1 works as if it was def not. Idk what's the correct mental model where both Ex1 and Ex2 would make sense.
_______________________________________________ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss