Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Steven Schveighoffer Sat, 15 Jan 2011 09:40:21 -0800

On Sat, 15 Jan 2011 12:11:59 -0500, Lutger Blijdestijn<[email protected]> wrote:

Steven Schveighoffer wrote:
...
I think a good standard to evaluate our handling of Unicode is to see
how easy it is to do things the right way. In the above, foreach would
slice the string grapheme by grapheme, and the == operator wouldperforma normalized comparison. While it works correctly, it's probably notthe
most efficient way to do thing however.
I think this is a good alternative, but I'd rather not impose this on
people like myself who deal mostly with English.  I think this should be
possible to do with wrapper types or intermediate ranges which have
graphemes as elements (per my suggestion above).

Does this sound reasonable?

-Steve
If its a matter of choosing which is the 'default' range, I'd thinkproperunicode handling is more reasonable than catering for english / asciionly.
Especially since this is already the case in phobos string algorithms.

English and (if I understand correctly) most other languages. Anylanguage which can be built from composable graphemes would work. And infact, ones that use some graphemes that cannot be composed will also workto some degree (for example, opEquals).

What I'm proposing (or think I'm proposing) is not exactly catering toEnglish and ASCII, what I'm proposing is simply not catering to morecomplex languages such as Hebrew and Arabic. What I'm trying to find is amiddle ground where most languages work, and the code is simple andefficient, with possibilities to jump down to lower levels for performance(i.e. switch to char[] when you know ASCII is all you are using) or jumpup to full unicode when necessary.


Essentially, we would have three levels of types:

char[], wchar[], dchar[] -- Considered to be arrays in every way.

string_t!T (string, wstring, dstring) -- Specialized string types that donormalization to dchars, but do not handle perfectly all graphemes. Workswith any algorithm that deals with bidirectional ranges. This is thedefault string type, and the type for string literals. Representedinternally by a single char[], wchar[] or dchar[] array.* utfstring_t!T -- specialized string to deal with full unicode, which mayperform worse than string_t, but supports everything unicode supports.May require a battery of specialized algorithms.


* - name up for discussion

Also note that phobos currently does *no* normalization as far as I cantell for things like opEquals. Two char[]'s that represent equivalentstrings, but not in the same way, will compare as !=.


-Steve

Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Reply via email to