Re: Of possible interest: fast UTF8 validation

Andrei Alexandrescu via Digitalmars-d Wed, 16 May 2018 13:16:04 -0700

On 5/16/18 1:18 PM, Joakim wrote:

On Wednesday, 16 May 2018 at 16:48:28 UTC, Dmitry Olshansky wrote:
On Wednesday, 16 May 2018 at 15:48:09 UTC, Joakim wrote:
On Wednesday, 16 May 2018 at 11:18:54 UTC, Andrei Alexandrescu wrote:
https://www.reddit.com/r/programming/comments/8js69n/validating_utf8_strings_using_as_little_as_07/
Sigh, this reminds me of the old quote about people spending a bunchof time making more efficient what shouldn't be done at all.
Validating UTF-8 is super common, most text protocols and files thesedays would use it, other would have an option to do so.
I’d like our validateUtf to be fast, since right now we do validationevery time we decode string. And THAT is slow. Trying to not validateon decode means most things should be validated on input...
I think you know what I'm referring to, which is that UTF-8 is a badlydesigned format, not that input validation shouldn't be done.

I find this an interesting minority opinion, at least from theperspective of the circles I frequent, where UTF8 is unanimouslyheralded as a great design. Only a couple of weeks ago I saw DylanBeattie give a very entertaining talk on exactly this topic:https://dotnext-piter.ru/en/2018/spb/talks/2rioyakmuakcak0euk0ww8/

If you could share some details on why you think UTF8 is badly designedand how you believe it could be/have been better, I'd be in your debt!



Andrei

Re: Of possible interest: fast UTF8 validation

Reply via email to