On Thu, May 17, 2018 at 12:36:35PM +0200, Hans Åberg wrote:
> 
> > On 17 May 2018, at 11:02, Joerg Schilling 
> > <joerg.schill...@fokus.fraunhofer.de> wrote:
> > 
> > Hans Åberg <haber...@telia.com> wrote:
> > 
> >>>> |I asked a person who speaks japanese and he told me that
> >>>> |
> >>>> | "\u4e00\u4e8c\u4e09"
> >>>> |
> >>>> |is similar to
> >>>> |
> >>>> | "one two three"
> >>>> |
> >>>> |and this is not used for computing.
> >>>> 
> >>>> If i recall correctly this has been discussed already; if not here
> >>>> then on the Unicode list.  Unicode brings quite a lot of
> >>>> codepoints, like CIRCLED DIGIT ONE, PARENTHESIZED DIGIT ONE, DIGIT
> >>>> ONE FULL STOP etc.  All these are marked "No", and i think the
> >>>> discussion concluded that they should not be taken into account
> >>>> when converting strings to numbers.
> >> 
> >> The intent may be that the value of the digit character c can be computed 
> >> by the expression c - '0' when >= 0 and <= 9, and is otherwise a 
> >> non-digit. Then 'isdigit' and [[:digit:]] are tied to that, so it is 
> >> impossible to use any other decimal digits.
> > 
> > This seems to be an important idea, as this japanese one two three
> > is not in a contiguous order.
> 
> It provides an efficient implementation, important on earlier computers. The 
> UTF-8 article [1], "History", mentions that they struggled around 1992 to 
> find proposals for that providing efficient implementations.
> 
> 1. https://en.wikipedia.org/wiki/UTF-8

Oh, well. You should be able to implement efficient code for the specs from 
14652 and 30112,
one would be that you, after testing for isdigit, the you index into a 4-bit 
table
with the binary value corresponding to the digit character. This is probably on 
par speedwise
with  subtracting the value for zero.

Best regards
keld

Reply via email to