spir wrote:
In my views, there is a missing level of abstraction in common UString processing libs 
and types. How to count the "â"s in a text? How to find one? Above, indexOf 
fails because my editor uses a precombined code, while the source (here literal) uses 
another form.
To be able to produce meaningful results, and to use simple routines like index, find, count..., 
the way we used to with single-length character sets, there should be a grouping phase on top of 
decoding; we would then process arrays of "stacks" representing characters, not of codes. 
ITo search, it's also necessary to have all characters normalised form, so that both "â" 
would match: another phase.
Unicode provides algorithms for those phases in constructing string representations -- 
but everyone seems to ignore the issues... s[0..1] would then return the first character, 
not the first code of the "stack" representing the first character.



http://www.digitalmars.com/d/2.0/phobos/std_utf.html

Reply via email to