On Monday, 24 March 2014 at 16:31:42 UTC, Andrei Alexandrescu
wrote:
On 3/24/14, 2:02 AM, monarch_dodra wrote:
On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu
wrote:
Here's a baseline: http://goo.gl/91vIGc. Destroy!
Andrei
Before we roll this out, could we discuss a strategy/guideline
in
regards to detecting and handling invalid UTF sequences?
I think std.array.front should return the invalid dchar on
error, and popFront should attempt to resync on error.
Ignoring UTF errors is a lossy operation and has the potential
problem of irreversible data loss. For example, consider a
program which needs to process Windows-1251 files: it would need
to read the file from disk, convert to UTF-8, process it, convert
back to Windows-1251, and save it back to disk. If a bug causes
the UTF-8 conversion step to be accidentally skipped, then all
Unicode data in that file is lost.
I think UTF-8 decoding operations should, by default, throw on
UTF-8 errors. Ignoring UTF-8 errors should only be done
explicitly, with the programmer's consent.