On Wed, Mar 07, 2018 at 04:33:25PM +0000, Seb via Digitalmars-d wrote:
> On Wednesday, 7 March 2018 at 15:26:40 UTC, Jon Degenhardt wrote:
> > Auto-decoding is a significant issue for the applications I work on
> > (search engines). There is a lot of string manipulation in these
> > environments, and performance matters. Auto-decoding is a meaningful
> > performance hit. Otherwise, Phobos has a very nice collection of
> > algorithms for string manipulation. It would be great to have a way
> > to turn auto-decoding off in Phobos.
> Well you can use byCodeUnit, which disables auto-decoding
> Though it's not well-known and rather annoying to explicitly add it
> almost everywhere.
And therein lies the rub: because it's *auto* decoding, rather than just
decoding, it's implicit everywhere, adding to the performance hit
without the coder being necessarily aware of it. You have to put in the
effort to add .byCodeUnit everywhere.
Worse yet, it gives the false sense of security that you're doing
Unicode "right", when actually that is *not* true at all, because a code
point is not equal to a grapheme (what people normally know as a
"character"). But because operating at the code point level *appears* to
be correct 80% of the time, bugs in string handling often go unnoticed,
unlike operating at the code unit level, where any Unicode handling bugs
are immediately obvious as soon as your string contains non-ASCII
So you're essentially paying the price of a significant performance hit
for the dubious benefit of non-100%-correct code, but with bugs
conveniently obscured so that it's harder to notice them.
Kill autodecoding, I say. Kill it with fire!!
MACINTOSH: Most Applications Crash, If Not, The Operating System Hangs