Steven Levithan wrote:
\w with Unicode should match [\p{L}\{Nd}_]. The best way to go for [[:alnum:]], for compatibility reasons, would probably be [\p{Ll}\p{Lu}\p{Lt}\p{Nd}]. This difference could be argued as a positive (if you like that exact set) or a negative (many users will think it's equivalent to \w with Unicode even though it isn't).

Although some regex libraries indeed implement the above, I've just looked over UTS#18 Annex C [1], which requires that \w be equivalent to:

[\p{Alphabetic}\p{M}\p{Nd}\p{Pc}]

Note that \p{Alphabetic} should include more than just \p{L}. I'm not clear on whether the differences from \p{L} are fully covered by the inclusion of \p{M} in the above character class. I'm sure there are plenty of people here with greater Unicode expertise than me who could clarify, though.

-- Steven Levithan

[1]: http://unicode.org/reports/tr18/#Compatibility_Properties

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to