Re: The Case Against Autodecode

Andrei Alexandrescu via Digitalmars-d Thu, 02 Jun 2016 12:11:55 -0700

On 06/02/2016 01:54 PM, Marc Schütz wrote:

On Thursday, 2 June 2016 at 14:28:44 UTC, Andrei Alexandrescu wrote:

That's not going to work. A false impression created in this thread
has been that code points are useless


They _are_ useless for almost anything you can do with strings. The only
places where they should be used are std.uni and std.regex.

Again: What is the justification for using code points, in your opinion?
Which practical tasks are made possible (and work _correctly_) if you
decode to code points, that don't already work with code units?

Pretty much everything. Consider s and s1 string variables with possiblydifferent encodings (UTF8/UTF16).

* s.all!(c => c == 'ö') works only with autodecoding. It returns alwaysfalse without.

* s.any!(c => c == 'ö') works only with autodecoding. It returns alwaysfalse without.


* s.balancedParens('〈', '〉') works only with autodecoding.

* s.canFind('ö') works only with autodecoding. It returns always falsewithout.

* s.commonPrefix(s1) works only if they both use the same encoding;otherwise it still compiles but silently produces an incorrect result.


* s.count('ö') works only with autodecoding. It returns always zero without.

* s.countUntil(s1) is really odd - without autodecoding, whether itworks at all, and the result it returns, depends on both encodings. Withautodecoding it always works and returns a number independent of theencodings.

* s.endsWith('ö') works only with autodecoding. It returns always falsewithout.

* s.endsWith(s1) works only with autodecoding. Otherwise it compiles andruns but produces incorrect results if s and s1 have different encodings.


* s.find('ö') works only with autodecoding. It never finds it without.

* s.findAdjacent is a very interesting one. It works with autodecoding,but without it it just does odd things.


* s.findAmong(s1) is also interesting. It works only with autodecoding.

* s.findSkip(s1) works only if s and s1 have the same encoding.Otherwise it compiles and runs but produces incorrect results.

* s.findSplit(s1), s.findSplitAfter(s1), s.findSplitBefore(s1) work onlyif s and s1 have the same encoding. Otherwise they compile and run butproduce incorrect results.

* s.minCount, s.maxCount are unlikely to be terribly useful but withautodecoding it consistently returns the extremum numeric code unitregardless of representation. Without, they just returnencoding-dependent and meaningless numbers.


* s.minPos, s.maxPos follow a similar semantics.

* s.skipOver(s1) only works with autodecoding. Otherwise it compiles andruns but produces incorrect results if s and s1 have different encodings.

* s.startsWith('ö') works only with autodecoding. Otherwise it compilesand runs but produces incorrect results if s and s1 have differentencodings.

* s.startsWith(s1) works only with autodecoding. Otherwise it compilesand runs but produces incorrect results if s and s1 have differentencodings.

* s.until!(c => c == 'ö') works only with autodecoding. Otherwise, itwill span the entire range.

===

The intent of autodecoding was to make std.algorithm work meaningfullywith strings. As it's easy to see I just went throughstd.algorithm.searching alphabetically and found issues literally withevery primitive in there. It's an easy exercise to go forth with the others.



Andrei

Re: The Case Against Autodecode

Reply via email to