Re: The Case Against Autodecode

Timon Gehr via Digitalmars-d Thu, 02 Jun 2016 13:07:23 -0700

On 02.06.2016 21:05, Andrei Alexandrescu wrote:

On 06/02/2016 01:54 PM, Marc Schütz wrote:

On Thursday, 2 June 2016 at 14:28:44 UTC, Andrei Alexandrescu wrote:

That's not going to work. A false impression created in this thread
has been that code points are useless


They _are_ useless for almost anything you can do with strings. The only
places where they should be used are std.uni and std.regex.

Again: What is the justification for using code points, in your opinion?
Which practical tasks are made possible (and work _correctly_) if you
decode to code points, that don't already work with code units?


Pretty much everything. Consider s and s1 string variables with possibly
different encodings (UTF8/UTF16).

* s.all!(c => c == 'ö') works only with autodecoding. It returns always
false without.
...


Doesn't work. Shouldn't compile. (char and wchar shouldn't be comparable.)

assert("ö".all!(c => c == 'ö')); // fails

* s.any!(c => c == 'ö') works only with autodecoding. It returns always
false without.
...


Doesn't work. Shouldn't compile.

assert("ö".any!(c => c == 'ö")); // fails
assert(!"̃ö⃖".any!(c => c== 'ö')); // fails

* s.balancedParens('〈', '〉') works only with autodecoding.
...


Doesn't work, e.g. s="⟨⃖". Shouldn't compile.

* s.canFind('ö') works only with autodecoding. It returns always false
without.
...


Doesn't work. Shouldn't compile.

assert("ö".canFind!(c => c == 'ö")); // fails

* s.commonPrefix(s1) works only if they both use the same encoding;
otherwise it still compiles but silently produces an incorrect result.
...


Doesn't work. Shouldn't compile.

* s.count('ö') works only with autodecoding. It returns always zero
without.
....


Doesn't work. Shouldn't compile.

* s.countUntil(s1) is really odd - without autodecoding, whether it
works at all, and the result it returns, depends on both encodings.  With
autodecoding it always works and returns a number independent of the
encodings.
...


Doesn't work. Shouldn't compile.

* s.endsWith('ö') works only with autodecoding. It returns always false
without.
...


Doesn't work. Shouldn't compile.

* s.endsWith(s1) works only with autodecoding.


Doesn't work.

Otherwise it compiles and
runs but produces incorrect results if s and s1 have different encodings.
...


Shouldn't compile.

* s.find('ö') works only with autodecoding. It never finds it without.
...


Doesn't work. Shouldn't compile.

* s.findAdjacent is a very interesting one. It works with autodecoding,
but without it it just does odd things.
....


Doesn't work. Shouldn't compile.

* s.findAmong(s1) is also interesting. It works only with autodecoding.
...


Doesn't work. Shouldn't compile.

* s.findSkip(s1) works only if s and s1 have the same encoding.
Otherwise it compiles and runs but produces incorrect results.
...


Doesn't work. Shouldn't compile.

* s.findSplit(s1), s.findSplitAfter(s1), s.findSplitBefore(s1) work only
if s and s1 have the same encoding.


Doesn't work.

Otherwise they compile and run but produce incorrect results.
...


Shouldn't compile.

* s.minCount, s.maxCount are unlikely to be terribly useful but with
autodecoding it consistently returns the extremum numeric code unit
regardless of representation. Without, they just return
encoding-dependent and meaningless numbers.

* s.minPos, s.maxPos follow a similar semantics.
...


Hardly a point in favour of autodecoding.

* s.skipOver(s1) only works with autodecoding.


Doesn't work. Shouldn't compile.

Otherwise it compiles and
runs but produces incorrect results if s and s1 have different encodings.
...


Shouldn't compile.

* s.startsWith('ö') works only with autodecoding. Otherwise it compiles
and runs but produces incorrect results if s and s1 have different
encodings.
...


Doesn't work. Shouldn't compile.

* s.startsWith(s1) works only with autodecoding. Otherwise it compiles
and runs but produces incorrect results if s and s1 have different
encodings.
...



Doesn't work. Shouldn't compile.

* s.until!(c => c == 'ö') works only with autodecoding. Otherwise, it
will span the entire range.
...


Doesn't work. Shouldn't compile.

===

The intent of autodecoding was to make std.algorithm work meaningfully
with strings. As it's easy to see I just went through
std.algorithm.searching alphabetically and found issues literally with
every primitive in there. It's an easy exercise to go forth with the
others.
...

Basically all of those still don't work with UTF-32 (assuming your goalis to operate on characters). You need to normalize and possibly iterateon graphemes. Also, many of those functions actually have valid usesintentionally operating on code units.

The "shouldn't compile" remarks ideally would be handled at the languagelevel: char/wchar/dchar should be incompatible types and char[], wchar[]and dchar[] should be handled like all arrays.

Re: The Case Against Autodecode

Reply via email to