On 5/16/18 1:18 PM, Joakim wrote:
On Wednesday, 16 May 2018 at 16:48:28 UTC, Dmitry Olshansky wrote:
On Wednesday, 16 May 2018 at 15:48:09 UTC, Joakim wrote:
On Wednesday, 16 May 2018 at 11:18:54 UTC, Andrei Alexandrescu wrote:
https://www.reddit.com/r/programming/comments/8js69n/validating_utf8_strings_using_as_little_as_07/
Sigh, this reminds me of the old quote about people spending a bunch
of time making more efficient what shouldn't be done at all.
Validating UTF-8 is super common, most text protocols and files these
days would use it, other would have an option to do so.
I’d like our validateUtf to be fast, since right now we do validation
every time we decode string. And THAT is slow. Trying to not validate
on decode means most things should be validated on input...
I think you know what I'm referring to, which is that UTF-8 is a badly
designed format, not that input validation shouldn't be done.
I find this an interesting minority opinion, at least from the
perspective of the circles I frequent, where UTF8 is unanimously
heralded as a great design. Only a couple of weeks ago I saw Dylan
Beattie give a very entertaining talk on exactly this topic:
https://dotnext-piter.ru/en/2018/spb/talks/2rioyakmuakcak0euk0ww8/
If you could share some details on why you think UTF8 is badly designed
and how you believe it could be/have been better, I'd be in your debt!
Andrei