On 5/20/16 10:13 AM, Martin Buchholz wrote:
On Fri, May 20, 2016 at 9:55 AM, Xueming Shen <[email protected]> wrote:
I expected to see general category Other "C" in Character.java

can open a rfe for that if needed.
Well, don't we want complete correspondence between Unicode standard,
Character, and regex?
Anything missing seems like a bug, not rfe!

I'd like to see tests that p{C} is the same as p{Other} is the same as
p{isOther} and similar with other categories.

Did you mean you want to add the "long name" support for unicode category?
I expect \p{C} and \p{Other} and \p{isOther} all to work (haven't tried it).
Is that not a reasonable expectation?


I'm the big fan of regex unicode support :-)

LC/L/M/N/P/S/Z/C are special gc, they are "groupings of related gc values". While j.l.Character does support the general category via getType() == gc_xyz, it does not explicitly have support for such grouping values for obvious reason (1:2), we have various isXXXXX() methods, but they are not specified as the equivalent to those gc_groupings. So yes, it would be a rfe, if you want them supported in j.u.Character class (such as Character.isType(int cp, int type))

Though lots of properties supported have been added in j.u.regex, it's still not completed. Names, such as "Other", "Number" are not supported, yet. "Letter" is ok as it falls back into the same posix name. The supported "property name" list is limited, as listed with CharPredicates.defUProp. It should be fine to add "Other", and other "long name" for the unicode gc, but it appears we might have a name space conflict for "letter" (posix or unicode). It should be safe to do \p{gc=xyz}. Again,
it is more like a rfe now :-)

-Sherman

Reply via email to