On Thu, May 17, 2018 at 12:36:35PM +0200, Hans Åberg wrote:
>
> > On 17 May 2018, at 11:02, Joerg Schilling
> > <joerg.schill...@fokus.fraunhofer.de> wrote:
> >
> > Hans Åberg <haber...@telia.com> wrote:
> >
> >>>> |I asked a person who speaks japanese and he told me that
> >>>> |
> >>>> | "\u4e00\u4e8c\u4e09"
> >>>> |
> >>>> |is similar to
> >>>> |
> >>>> | "one two three"
> >>>> |
> >>>> |and this is not used for computing.
> >>>>
> >>>> If i recall correctly this has been discussed already; if not here
> >>>> then on the Unicode list. Unicode brings quite a lot of
> >>>> codepoints, like CIRCLED DIGIT ONE, PARENTHESIZED DIGIT ONE, DIGIT
> >>>> ONE FULL STOP etc. All these are marked "No", and i think the
> >>>> discussion concluded that they should not be taken into account
> >>>> when converting strings to numbers.
> >>
> >> The intent may be that the value of the digit character c can be computed
> >> by the expression c - '0' when >= 0 and <= 9, and is otherwise a
> >> non-digit. Then 'isdigit' and [[:digit:]] are tied to that, so it is
> >> impossible to use any other decimal digits.
> >
> > This seems to be an important idea, as this japanese one two three
> > is not in a contiguous order.
>
> It provides an efficient implementation, important on earlier computers. The
> UTF-8 article [1], "History", mentions that they struggled around 1992 to
> find proposals for that providing efficient implementations.
>
> 1. https://en.wikipedia.org/wiki/UTF-8

## Advertising

Oh, well. You should be able to implement efficient code for the specs from
14652 and 30112,
one would be that you, after testing for isdigit, the you index into a 4-bit
table
with the binary value corresponding to the digit character. This is probably on
par speedwise
with subtracting the value for zero.
Best regards
keld