Re: Transparent ANSI to UTF-8 conversion

Lubos Pintes Wed, 27 Feb 2013 12:35:20 -0800

I don't understand the CTFE usage in this context. I thought aboutsomething like

dchar[] windows_1250=[...];
Isn't this enough?
Thank

Dňa 27. 2. 2013 18:32 Dmitry Olshansky  wrote / napísal(a):

27-Feb-2013 16:20, monarch_dodra пишет:

On Wednesday, 27 February 2013 at 10:56:16 UTC, Lubos Pintes wrote:

Hi,
I would like to transparently convert from ANSI to UTF-8 when dealing
with text files. For example here in Slovakia, virtually every text
file is in Windows-1250.
If someone opens a text file, he or she expects that it will work
properly. So I suppose, that it is not feasible to tell someone "if
you want to use my program, please convert every text to UTF-8".


To obtain the mapping from ANSI to Unicode for particular code page is
trivial. Maybe even MultibyteToWidechar could help with this.

I however need to know how to do it "D-way". Could I define something
like TextReader class? Or perhaps some support already exists somewhere?
Thank


I'd say the D way would be to simply exploit the fact that UTF is built
into the language, and as such, not worry about encoding, and use raw
code points.

You get you "Codepage to unicode *codepoint*" table, and then you simply
map each character to a dchar. From there, D will itself convert your
raw unicode (aka UTF-32) to UTF8 on the fly, when you need it. For
example, writing to a file will automatically convert input to UTF-8.
You can also simply use std.conv.to!string to convert any UTF scheme to
UTF-8 (or any other UTF too for that matter).


Making a table that translates ANSI to UTF8 is trivially constructible
using CTFE from the static one that does ANSI -> dchar.


This may not be as efficient as a "true" "codepage to UTF8 table" but:
1) Given you'll most probably be IO bound anyways, who cares?


With in-memory transcoding you won't be. Text editors are typically all
in-memory or mmap-ed.

2) Scalability. D does everything but the code page to code point
mapping. Why bother doing any more than that?

Re: Transparent ANSI to UTF-8 conversion

Reply via email to