On 06/02/2016 01:54 PM, Marc Schütz wrote:
On Thursday, 2 June 2016 at 14:28:44 UTC, Andrei Alexandrescu wrote:
That's not going to work. A false impression created in this thread
has been that code points are useless

They _are_ useless for almost anything you can do with strings. The only
places where they should be used are std.uni and std.regex.

Again: What is the justification for using code points, in your opinion?
Which practical tasks are made possible (and work _correctly_) if you
decode to code points, that don't already work with code units?

Pretty much everything. Consider s and s1 string variables with possibly different encodings (UTF8/UTF16).

* s.all!(c => c == 'ö') works only with autodecoding. It returns always false without.

* s.any!(c => c == 'ö') works only with autodecoding. It returns always false without.

* s.balancedParens('〈', '〉') works only with autodecoding.

* s.canFind('ö') works only with autodecoding. It returns always false without.

* s.commonPrefix(s1) works only if they both use the same encoding; otherwise it still compiles but silently produces an incorrect result.

* s.count('ö') works only with autodecoding. It returns always zero without.

* s.countUntil(s1) is really odd - without autodecoding, whether it works at all, and the result it returns, depends on both encodings. With autodecoding it always works and returns a number independent of the encodings.

* s.endsWith('ö') works only with autodecoding. It returns always false without.

* s.endsWith(s1) works only with autodecoding. Otherwise it compiles and runs but produces incorrect results if s and s1 have different encodings.

* s.find('ö') works only with autodecoding. It never finds it without.

* s.findAdjacent is a very interesting one. It works with autodecoding, but without it it just does odd things.

* s.findAmong(s1) is also interesting. It works only with autodecoding.

* s.findSkip(s1) works only if s and s1 have the same encoding. Otherwise it compiles and runs but produces incorrect results.

* s.findSplit(s1), s.findSplitAfter(s1), s.findSplitBefore(s1) work only if s and s1 have the same encoding. Otherwise they compile and run but produce incorrect results.

* s.minCount, s.maxCount are unlikely to be terribly useful but with autodecoding it consistently returns the extremum numeric code unit regardless of representation. Without, they just return encoding-dependent and meaningless numbers.

* s.minPos, s.maxPos follow a similar semantics.

* s.skipOver(s1) only works with autodecoding. Otherwise it compiles and runs but produces incorrect results if s and s1 have different encodings.

* s.startsWith('ö') works only with autodecoding. Otherwise it compiles and runs but produces incorrect results if s and s1 have different encodings.

* s.startsWith(s1) works only with autodecoding. Otherwise it compiles and runs but produces incorrect results if s and s1 have different encodings.

* s.until!(c => c == 'ö') works only with autodecoding. Otherwise, it will span the entire range.

===

The intent of autodecoding was to make std.algorithm work meaningfully with strings. As it's easy to see I just went through std.algorithm.searching alphabetically and found issues literally with every primitive in there. It's an easy exercise to go forth with the others.


Andrei

Reply via email to