Steven Levithan wrote:
\w with Unicode should match [\p{L}\{Nd}_]. The best way to go for
[[:alnum:]], for compatibility reasons, would probably be
[\p{Ll}\p{Lu}\p{Lt}\p{Nd}]. This difference could be argued as a positive
(if you like that exact set) or a negative (many users will think it's
equivalent to \w with Unicode even though it isn't).
Although some regex libraries indeed implement the above, I've just looked
over UTS#18 Annex C [1], which requires that \w be equivalent to:
[\p{Alphabetic}\p{M}\p{Nd}\p{Pc}]
Note that \p{Alphabetic} should include more than just \p{L}. I'm not clear
on whether the differences from \p{L} are fully covered by the inclusion of
\p{M} in the above character class. I'm sure there are plenty of people here
with greater Unicode expertise than me who could clarify, though.
-- Steven Levithan
[1]: http://unicode.org/reports/tr18/#Compatibility_Properties
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss