Re: The Case Against Autodecode

Marc Schütz via Digitalmars-d Fri, 13 May 2016 04:08:23 -0700

On Friday, 13 May 2016 at 10:38:09 UTC, Jonathan M Davis wrote:

Ideally, algorithms would be Unicode aware as appropriate, butthe default would be to operate on code units with wrappers tohandle decoding by code point or grapheme. Then it's easy towrite fast code while still allowing for full correctness.Granted, it's not necessarily easy to get correct code thatway, but anyone who wants fully correctness without caringabout efficiency can just use ranges of graphemes. Ranges ofcode points are rare regardless.

char[], wchar[] etc. can simply be made non-ranges, so that theuser has to choose between .byCodePoint, .byCodeUnit (or.representation as it already exists), .byGrapheme, or evenhigher-level units like .byLine or .byWord. Ranges of char, wcharhowever stay as they are today. That way it's harder toaccidentally get it wrong.

Based on what I've seen in previous conversations onauto-decoding over the past few years (be it in the newsgroup,on github, or at dconf), most of the core devs think thatauto-decoding was a major blunder that we continue to pay for.But unfortunately, even if we all agree that it was a hugemistake and want to fix it, the question remains of how to dothat without breaking tons of code - though since AFAIK, Andreiis still in favor of auto-decoding, we'd have a hard time goingforward with plans to get rid of it even if we had come up witha good way of doing so. But I would love it if we could get ridof auto-decoding and clean up string handling in D.

There is a simple deprecation path that's already been suggested.`isInputRange` and friends can output a helpful deprecationwarning when they're called with a range that currently triggersauto-decoding.

Re: The Case Against Autodecode

Reply via email to