On Thursday, 29 August 2013 at 18:58:57 UTC, H. S. Teoh wrote:
No kidding! I was trying to write a program that navigates a website automatically using std.net.curl, and I'm running into all sorts of silly roadblocks, including std.encoding not supporting iso-8859-*
encodings.


It doesn't look like adding the rest of the ISO-8859 encodings would be all that difficult if you used the existing ISO-8859-1 (Latin1) as a base. I don't quite understand where and how transcoding is done though.

The good news is that on Linux, there's a handy utility called 'recode', which comes with a library called 'librecode', that supports converting between a huge number of different encodings -- many more than probably you or I have imagined existed -- including to/from Unicode. I know we don't like including external libraries in Phobos, but I honestly don't
see any justification for reinventing the wheel by writing (and
maintaining!) our own equivalent to librecode, unless licensing issues prevents us from including librecode in Phobos, nicely wrapped in a
modern range-based D API.


However, because all of the XML special symbols should be ASCII, you should still be able to avoid decoding characters for the most part. It's only when you have to actually look at the content that Unicode would potentially matter. So, the performance hit of decoding Unicode
characters should mostly be able to be avoided.
[...]

One way is to write the core code of std.xml in such a way that it handles all data as ubyte[] (or ushort[]/uint[] for 16-bit/32-bit encodings) so that it's encoding-independent. Then on top of this core, write some convenience wrappers that casts/converts to string, wstring, dstring. As an initial stab, we could support only UTF-8, UTF-16, UTF-32 if the user asks for string/wstring/dstring, and leave XML in other encodings up to the user to decode manually. This way, at least the user
can get the data out of the file.

Later on, once we've gotten our act together with std.encoding, we can
hook it up to std.xml to provide autoconversion.


T

Reply via email to