Re: Challenge: write a really really small front() for UTF8

w0rp Mon, 24 Mar 2014 05:57:32 -0700

On Monday, 24 March 2014 at 09:02:19 UTC, monarch_dodra wrote:

On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescuwrote:
Here's a baseline: http://goo.gl/91vIGc. Destroy!
Andrei
Before we roll this out, could we discuss a strategy/guidelinein regards to detecting and handling invalid UTF sequences?
Having a fast "front" is fine and all, but if it means yourprogram asserting in release (or worst, silently corruptingmemory) just because the client was trying to read a bad textfile, I'm unsure this is acceptable.

I would strongly advise to at least offer an option, possibly viaa template parameter, for turning error handling on or off,similar to how Python handles decoding. Examples below in Python3.


b"\255".decode("utf-8", errors="strict") # UnicodeDecodeError

b"\255".decode("utf-8", errors="replace") # replacement characterusedb"\255".decode("utf-8", errors="ignore") # Empty string, invalidsequence removed.

All three strategies are useful from time to time. I mainly reachfor option three when I'm trying to get some text data out ofsome old broken databases or similar.

We may consider leaving the error checking on in -release for the'strict' decoding, but throwing an Error instead of an exceptionso the function can be nothrow. This would prevent memorycorruption in release code. assert vs throw Error is up fordebate.

Re: Challenge: write a really really small front() for UTF8

Reply via email to