Thanks for this, although this begs another question :-) If the char which is to be converted is 0661, say, then what will be the value of the subtraction? Will it be 0661 - 0660 or will it be 0661 - 0030? I assume that a literal '0' will always map to 0030 rather than cleverly detect the range of digits that the char belongs to.
Richard -----Original Message----- From: Jonathan Pryor [mailto:[EMAIL PROTECTED] Sent: 05 October 2004 23:53 To: Polton, Richard (IT) Cc: Jambunathan Jambunathan; [EMAIL PROTECTED] Subject: RE: [Mono-list] conversions A quick perusal through Perl's "Category.pl" shows this: (1) Numbers are categorized as "Nd" (2) The only ranges that are "Nd" seem to be: 0030 - 0039 '0' - '9' 0660 - 0669 ARABIC-INDIC DIGIT 0 - 9 (same order as ASCII) 06F0 - 06F9 EXTENDED ARABIC-INDIC DIGIT 0-9 ("") 0966 - 096F DEVANAGRAI DIGIT 0-9 09E6 - 09EF BENGALI DIGIT 0-9 0A66 - 0A6F 0AE6 - 0AEF 0B66 - 0B6F 0BE7 - 0BEF 0C66 - 0C6F 0CE6 - 0CEF 0D66 - 0D6F 0E50 - 0E59 0ED0 - 0ED9 0F20 - 0F29 ... Plus 8 more... I'm too lazy to look at all of these ranges, but the ones I did look at all had digits in the order 0..9. The subtraction should be legal for all of these glyphs. (Which is probably by design; it would be very odd -- broken? -- to have so many digits in the "right" order, and then have a few in a different order...) Gnome's Character Map program (gucharmap) is very handy for looking up the Unicode Category a character belongs to. Too bad the opposite direction (Unicode Category -> characters) tends to be more difficult (hence consulting Perl's internal tables). - Jon On Tue, 2004-10-05 at 07:31, Polton, Richard (IT) wrote: > Thanks for this. Is it fair to say, then, that only Arabic numerals > are counted as digits? Even though other numeric characters have > integer values? > > -----Original Message----- > From: Jonathan Pryor [mailto:[EMAIL PROTECTED] > Sent: 05 October 2004 11:32 > To: Polton, Richard (IT) > Cc: Jambunathan Jambunathan; [EMAIL PROTECTED] > Subject: RE: [Mono-list] conversions > > On Tue, 2004-10-05 at 04:34, Polton, Richard (IT) wrote: > > In fact, habing given it further thought, I have a couple of > questions: > > > > i) if I sit at a Japanese terminal (for example) and enter '-', i.e. > > ichi or 'one', is this a valid Unicode character? > > Yes. > > > ii) how wide is the 'char' datatype? I assume it contains Unicode > > rather than single-byte ASCII. > > 16-bit unsigned value. It supports Unicode. > > > iii) if entering 'ichi' is valid, and char contains Unicode, then I > > suspect that the below subtration will return a number substantially > > greater than one. > > No. At least, not if it's remotely like CVS HEAD: > > public static int Val (char Expression) { > if (char.IsDigit(Expression)) { > return Expression - '0'; > } > else { > throw new ArgumentException(); > } > } > > Ichi isn't a digit, so it will generate an ArgumentException. > > (Assuming that Ichi is Unicode U+4E00, which certainly looks like '-'. > It's in the Unicode category "Letter, Other".) > > The subtraction should be safe, as (1) it's only done on digits, and > (2) Unicode follows the ASCII character ordering (for glyphs 0-127), > which permits this subtraction. > > - Jon > -------------------------------------------------------- > > NOTICE: If received in error, please destroy and notify sender. Sender does not waive confidentiality or privilege, and use is prohibited. > > _______________________________________________ > Mono-list maillist - [EMAIL PROTECTED] > http://lists.ximian.com/mailman/listinfo/mono-list -------------------------------------------------------- NOTICE: If received in error, please destroy and notify sender. Sender does not waive confidentiality or privilege, and use is prohibited. _______________________________________________ Mono-list maillist - [EMAIL PROTECTED] http://lists.ximian.com/mailman/listinfo/mono-list
