On 3/9/14, 12:25 PM, Dmitry Olshansky wrote:
Okay putting potential breakage aside. Let me sketch up an additive way of improving current situation.
Now you're talking.
1. Say we recognize any indexable entity of char/wchar/dchar, that however has .front returning a dchar as a "narrow string". Nothing fancy - it's just a generalization of isNarrowString. At least a range over Array!char will work as string now.
Wait, why is dchar[] a narrow string?
2. Likewise representation must be made something more explicit say byCodeUnit and work on any isNarrowString per above. The opposite of that is byCodePoint.
Fine.
3. ElementEncodingType is too verbose and misleading. Something more explicit would be useful. ItemType/UnitType maybe?
We're stuck with that name.
4. We lack lots of good stuff from Unicode standard. Some recently landed in std.uni. We need many more, and deprecate crappy ones in std.string. (e.g. wrapping text is one)
Add away.
5. Most algorithms conceptually decode, but may be enhanced to work directly on UTF-8/UTF-16. That together with 1, should IMHO solve most of our problems.
Great!
6. Take into account ASCII and maybe other alphabets? Should be as trivial as .assumeASCII and then on you march with all of std.algo/etc.
Walter is against that. His main argument is that UTF already covers ASCII with only a marginal cost (that can be avoided) and that we should go farther into the future instead of catering to an obsolete representation.
Andrei
