On Monday, 24 March 2014 at 16:31:42 UTC, Andrei Alexandrescu wrote:
On 3/24/14, 2:02 AM, monarch_dodra wrote:
On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
Here's a baseline: http://goo.gl/91vIGc. Destroy!

Andrei

Before we roll this out, could we discuss a strategy/guideline in
regards to detecting and handling invalid UTF sequences?

I think std.array.front should return the invalid dchar on error, and popFront should attempt to resync on error.

Ignoring UTF errors is a lossy operation and has the potential problem of irreversible data loss. For example, consider a program which needs to process Windows-1251 files: it would need to read the file from disk, convert to UTF-8, process it, convert back to Windows-1251, and save it back to disk. If a bug causes the UTF-8 conversion step to be accidentally skipped, then all Unicode data in that file is lost.

I think UTF-8 decoding operations should, by default, throw on UTF-8 errors. Ignoring UTF-8 errors should only be done explicitly, with the programmer's consent.

Reply via email to