On 5/16/18 1:18 PM, Joakim wrote:
On Wednesday, 16 May 2018 at 16:48:28 UTC, Dmitry Olshansky wrote:
On Wednesday, 16 May 2018 at 15:48:09 UTC, Joakim wrote:
On Wednesday, 16 May 2018 at 11:18:54 UTC, Andrei Alexandrescu wrote:
https://www.reddit.com/r/programming/comments/8js69n/validating_utf8_strings_using_as_little_as_07/

Sigh, this reminds me of the old quote about people spending a bunch of time making more efficient what shouldn't be done at all.

Validating UTF-8 is super common, most text protocols and files these days would use it, other would have an option to do so.

I’d like our validateUtf to be fast, since right now we do validation every time we decode string. And THAT is slow. Trying to not validate on decode means most things should be validated on input...

I think you know what I'm referring to, which is that UTF-8 is a badly designed format, not that input validation shouldn't be done.

I find this an interesting minority opinion, at least from the perspective of the circles I frequent, where UTF8 is unanimously heralded as a great design. Only a couple of weeks ago I saw Dylan Beattie give a very entertaining talk on exactly this topic: https://dotnext-piter.ru/en/2018/spb/talks/2rioyakmuakcak0euk0ww8/

If you could share some details on why you think UTF8 is badly designed and how you believe it could be/have been better, I'd be in your debt!


Andrei

Reply via email to