On Friday, 4 January 2013 at 22:00:02 UTC, Dmitry Olshansky wrote:
05-Jan-2013 00:51, monarch_dodra пишет:
Anyways, those 4 CUNEIFORM asside, what do you make of the
entries in Lo:
http://unicode.org/cldr/utility/character.jsp?a=F96B
These appear to be numeric, but aren't inside Nd/No/Nl. They
should return true to isNumber, no?

Hmmm. Take a look here:
http://unicode.org/cldr/utility/properties.jsp

There is a section called Numeric that has 3 properties,
and then there is a General section.
The General has Category which in turn has 'Number' category.

Bottom line is that I believe that std.uni isXXX queries the category of a symbol and not some other property. Let any mishaps in between properties and general category be consortium's headache.

Maybe isNumber's "documented behavior" is wrong?

Problem is I can't come up with a good description of some other behavior. Maybe this one [^[:Numeric_Type=None:]]
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5E%5B%3ANumeric_Type%3DNone%3A%5D%5D&g=

Sounds like the root of the problem is that isNumber != Numeric_Type[Decimal, Digit, Numeric]

Ergo, there is no correlation between isNumber and numericValue.

Feels like there is a lot missing from std.uni, but at the same time, unicode is really huge.

At the very least, I think we should have Category enum, along with a (get) "category" function.

I was just saying to jmdavis in the pull that std.ascii had "isDigit", but that uni didn't. In truth, both also lack isDecimal and isNumeric.

There would just be a bit of ambiguity now between the broad "isNumeric", and "all the chars that have a numeric value"... :/

Damn. Unicode is complicated.

Anyways, taking my weekend break.

Reply via email to