Re: numericValue for (unicode) characters

Dmitry Olshansky Fri, 04 Jan 2013 12:35:37 -0800

04-Jan-2013 21:48, monarch_dodra пишет:

On Friday, 4 January 2013 at 13:18:48 UTC, Dmitry Olshansky wrote:

04-Jan-2013 15:58, Jonathan M Davis пишет:

On Thursday, January 03, 2013 20:40:47 monarch_dodra wrote:

So... do we agree on
ascii: int - not found => -1
uni: double - not found => nan


I'm not a fan of the ASCII version returning -1, but I don't really
have a
better suggestion. I suppose that you could throw instead, but I
don't know if
that's a good idea or not. It _would_ be more consistent with our other
conversion functions however.

- Jonathan M Davis


I find low-level stuff that throws to be overly awkward to deal with
(not to mention performance problems).

Hm... I've found an brilliant primitive Expected!T that could be of
great help in error code vs exceptions problem. See the recent
Andrei's talk that went live not long ago:

http://channel9.msdn.com/Shows/Going+Deep/C-and-Beyond-2012-Andrei-Alexandrescu-Systematic-Error-Handling-in-C


Time to put the analogous stuff into Phobos?


I finished an implementation:

https://github.com/D-Programming-Language/phobos/pull/1052

It is not "pull ready", so we can still discuss it.

Well, for start it features tons of code duplication. But I'm replacingthe whole std.uni anyway...

I raised a couple of issues in the pull, which I'll copy here:

//----
I did run into a couple of issues, namelly that I'm not getting 100%
equivalence between chars that are numeric, and chars with numeric
value... Is this normal...?


Yes, it's called Unicode ;)

* There's a fair bit of chars that have numeric value, but aren't
isNumber. I think they might be new in 6.1.0. But I'm not sure. I
decided it was best to have them return nan, instead of having
inconsistent behavior.


You also might be using 6.2. It's released as of a fall of 2012.

* There's a couple characters in tableLo that have numeric values. These
aren't considered in isNumber either. I think this might be a bug though.
* There are 4 "non-number numeric" characters in "CUNEIFORM NUMERIC
SIGN". These return wild values, and in particular two of them return
-1. I *think* this should actually return nan for us, because (AFAIK),
-1 is just wild for invalid :/

Some have numeric value of '-1' I think. The truth of the matter is asusual with Unicode things are rather complicated.So 'numeric character' is a category (general) and 'has numeric value'is some other property of codepoint that may or may not correlatedirectly with category.

Thus I think (looking ahead into your other post) that isNumber iscorrect as it follows its documented behavior.


Maybe we should just return -1 on invalid unicode? Or maybe it's just my
input file:
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
It doesn't have a separate field for isNumber/numericValue, so it is
forced to write a wild number. Maybe these four chars should return nan?


Nope. Does letter 'A' return a wild number?

//----

Oh yeah, I also added isNumber to std.ascii. Feels wrong to not have it
if we have numericValue.



--
Dmitry Olshansky

Re: numericValue for (unicode) characters

Reply via email to