On Thursday, 29 August 2013 at 18:58:57 UTC, H. S. Teoh wrote:
No kidding! I was trying to write a program that navigates a
website
automatically using std.net.curl, and I'm running into all
sorts of
silly roadblocks, including std.encoding not supporting
iso-8859-*
encodings.
It doesn't look like adding the rest of the ISO-8859 encodings
would be all that difficult if you used the existing ISO-8859-1
(Latin1) as a base. I don't quite understand where and how
transcoding is done though.
The good news is that on Linux, there's a handy utility called
'recode',
which comes with a library called 'librecode', that supports
converting
between a huge number of different encodings -- many more than
probably
you or I have imagined existed -- including to/from Unicode. I
know we
don't like including external libraries in Phobos, but I
honestly don't
see any justification for reinventing the wheel by writing (and
maintaining!) our own equivalent to librecode, unless licensing
issues
prevents us from including librecode in Phobos, nicely wrapped
in a
modern range-based D API.
However, because all of the XML special symbols should be
ASCII, you
should still be able to avoid decoding characters for the most
part.
It's only when you have to actually look at the content that
Unicode
would potentially matter. So, the performance hit of decoding
Unicode
characters should mostly be able to be avoided.
[...]
One way is to write the core code of std.xml in such a way that
it
handles all data as ubyte[] (or ushort[]/uint[] for
16-bit/32-bit
encodings) so that it's encoding-independent. Then on top of
this core,
write some convenience wrappers that casts/converts to string,
wstring,
dstring. As an initial stab, we could support only UTF-8,
UTF-16, UTF-32
if the user asks for string/wstring/dstring, and leave XML in
other
encodings up to the user to decode manually. This way, at least
the user
can get the data out of the file.
Later on, once we've gotten our act together with std.encoding,
we can
hook it up to std.xml to provide autoconversion.
T