On Friday, 4 January 2013 at 22:00:02 UTC, Dmitry Olshansky wrote:
05-Jan-2013 00:51, monarch_dodra пишет:
Anyways, those 4 CUNEIFORM asside, what do you make of the
entries in Lo:
http://unicode.org/cldr/utility/character.jsp?a=F96B
These appear to be numeric, but aren't inside Nd/No/Nl. They
should return true to isNumber, no?
Hmmm. Take a look here:
http://unicode.org/cldr/utility/properties.jsp
There is a section called Numeric that has 3 properties,
and then there is a General section.
The General has Category which in turn has 'Number' category.
Bottom line is that I believe that std.uni isXXX queries the
category of a symbol and not some other property. Let any
mishaps in between properties and general category be
consortium's headache.
Maybe isNumber's "documented behavior" is wrong?
Problem is I can't come up with a good description of some
other behavior. Maybe this one [^[:Numeric_Type=None:]]
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5E%5B%3ANumeric_Type%3DNone%3A%5D%5D&g=
Sounds like the root of the problem is that isNumber !=
Numeric_Type[Decimal, Digit, Numeric]
Ergo, there is no correlation between isNumber and numericValue.
Feels like there is a lot missing from std.uni, but at the same
time, unicode is really huge.
At the very least, I think we should have Category enum, along
with a (get) "category" function.
I was just saying to jmdavis in the pull that std.ascii had
"isDigit", but that uni didn't. In truth, both also lack
isDecimal and isNumeric.
There would just be a bit of ambiguity now between the broad
"isNumeric", and "all the chars that have a numeric value"... :/
Damn. Unicode is complicated.
Anyways, taking my weekend break.