http://d.puremagic.com/issues/show_bug.cgi?id=5543
--- Comment #10 from monarchdo...@gmail.com 2012-12-21 07:53:36 PST --- (In reply to comment #5) > > I'm wrapping up a revamp of std.uni that makes it piece of cake to create > character sets. And maps are converted to multi-staged tables that are faster > the binary search on a large set. I'd suggest to wait a bit on it (so as to > not > duplicate work) and introduce only std.ascii version as the most useful. > > The ongoing polishing, fixing and testing against ICU is going on here: > https://github.com/blackwhale/gsoc-bench-2012 OK: The thing I was having trouble though is that existing binary search returns a bool, whereas I need the actual entry, so I can do "value - entry[0]", eg: //---- static immutable dchar[2][] table1 = [ [ 0x0030, 0x0039], // [ 0x0660, 0x0669], //ARABIC-INDIC [ 0x06F0, 0x06F9], //EXTENDED ARABIC-INDIC ... //--- That's because all the entries in [Nd] are consecutive numerals starting at 0. I can also cram a select couple of entries from [Nl] and [Po] that also use this scheme. So if I have the unicode 0x0665 (The ARABIC-INDIC numeral '6'), I'd want to find [ 0x0660, 0x0669], and then "return 0x0665 - 0x0660". Well, I don't need the entire pair, but at least the lhs of the pair. If you could keep that in mind during your re-write. Or not. Just throwing it out there. For all other entries in [Nl] and [Po], I'd have: static immutable dchar[2][] table1 = [ [ 0x261D, 100], //ROMAN NUMERAL ONE HUNDRED So that's just basic dictionary. But I don't think you can statically allocate an AA. So yeah, just throwing that your direction too. > > The file is too large for std.xml to handle, so it's back to C++ for me :/ > > > http://www.unicode.org/Public/UNIDATA/UnicodeData.txt > > Same thing but no useless XML trash. Description of fields is somewhere in the > middle of this document > http://www.unicode.org/reports/tr44/ Nice, TY. > > The only questions I have is: > > Return value: int or double? > > Should be rational to acurately represent things like "1/5" character ;) > I do suspect some simple custom type could do (2 shorts packed in one struct > etc.). > > > Input is not numeric: -1 or exception? > > -1 is fine I think as this rather low level (per character) and it's not at > all > convenient to throw (and then catch). The only issue I have with returning -1 is that it is a magic value. The fact that there is no unicode for -1 is pure coincidence, and not by design. In particular, any attempt to write "if (numericValue(c) < 0) fail" would also be wrong because: http://unicode.org/cldr/utility/character.jsp?a=0F33 The TIBETAN DIGIT HALF ZERO returns -0.5 Do we *really* want to standardize the syntax of "if (numericValue(c) < -0.7)" ? ... Damn you unicode! -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------