Re: Transparent ANSI to UTF-8 conversion

monarch_dodra Wed, 27 Feb 2013 04:25:17 -0800

On Wednesday, 27 February 2013 at 10:56:16 UTC, Lubos Pinteswrote:

Hi,
I would like to transparently convert from ANSI to UTF-8 whendealing with text files. For example here in Slovakia,virtually every text file is in Windows-1250.If someone opens a text file, he or she expects that it willwork properly. So I suppose, that it is not feasible to tellsomeone "if you want to use my program, please convert everytext to UTF-8".
To obtain the mapping from ANSI to Unicode for particular codepage is trivial. Maybe even MultibyteToWidechar could help withthis.
I however need to know how to do it "D-way". Could I definesomething like TextReader class? Or perhaps some supportalready exists somewhere?
Thank

I'd say the D way would be to simply exploit the fact that UTF isbuilt into the language, and as such, not worry about encoding,and use raw code points.

You get you "Codepage to unicode *codepoint*" table, and then yousimply map each character to a dchar. From there, D will itselfconvert your raw unicode (aka UTF-32) to UTF8 on the fly, whenyou need it. For example, writing to a file will automaticallyconvert input to UTF-8. You can also simply usestd.conv.to!string to convert any UTF scheme to UTF-8 (or anyother UTF too for that matter).

This may not be as efficient as a "true" "codepage to UTF8 table"but:

1) Given you'll most probably be IO bound anyways, who cares?

2) Scalability. D does everything but the code page to code pointmapping. Why bother doing any more than that?

Re: Transparent ANSI to UTF-8 conversion

Reply via email to