On Friday, 13 May 2016 at 10:38:09 UTC, Jonathan M Davis wrote:
Ideally, algorithms would be Unicode aware as appropriate, but the default would be to operate on code units with wrappers to handle decoding by code point or grapheme. Then it's easy to write fast code while still allowing for full correctness. Granted, it's not necessarily easy to get correct code that way, but anyone who wants fully correctness without caring about efficiency can just use ranges of graphemes. Ranges of code points are rare regardless.
char[], wchar[] etc. can simply be made non-ranges, so that the user has to choose between .byCodePoint, .byCodeUnit (or .representation as it already exists), .byGrapheme, or even higher-level units like .byLine or .byWord. Ranges of char, wchar however stay as they are today. That way it's harder to accidentally get it wrong.
Based on what I've seen in previous conversations on auto-decoding over the past few years (be it in the newsgroup, on github, or at dconf), most of the core devs think that auto-decoding was a major blunder that we continue to pay for. But unfortunately, even if we all agree that it was a huge mistake and want to fix it, the question remains of how to do that without breaking tons of code - though since AFAIK, Andrei is still in favor of auto-decoding, we'd have a hard time going forward with plans to get rid of it even if we had come up with a good way of doing so. But I would love it if we could get rid of auto-decoding and clean up string handling in D.
There is a simple deprecation path that's already been suggested. `isInputRange` and friends can output a helpful deprecation warning when they're called with a range that currently triggers auto-decoding.
