Re: List of Phobos functions that allocate memory?

Dmitry Olshansky Fri, 07 Feb 2014 12:16:27 -0800

07-Feb-2014 21:07, Andrej Mitrovic пишет:

On 2/7/14, Dmitry Olshansky <[email protected]> wrote:

Much simpler - it returns a special dchar to designate bad encoding. And
there is one defined by Unicode spec.


A NaN for chars? Sounds great to me! :)

It's called \uFFFD and is specifically for bad encodings. I wonder whynobody had perused the spec when writing std.utf.decode in the firstplace...


5.22 Best Practice for U+FFFD Substitution

When converting text from one character encoding to another, aconversion algorithm mayencounter unconvertible code units. This is most commonly caused by somesort of corruptionof the source data, so that it does not correctly follow thespecification for thatcharacter encoding. Examples include dropping a byte in a multibyteencoding such asShift-JIS, improper concatenation of strings, a mismatch between anencoding declaration

and actual encoding of text, use of non-shortest form for UTF-8, and so on.

...

Whenever an unconvertible offset is reached during conversion of a code
unit sequence:
1. The maximal subpart at that offset should be replaced by a single
U+FFFD.
2. The conversion should proceed at the offset immediately after the maximal
subpart.
---

Fast, simple and according to the standard. Best of all - no stinkin'exceptions! ;)


--
Dmitry Olshansky

Re: List of Phobos functions that allocate memory?

Reply via email to