In my efforts to learn D I am writing some code to read files in different UTF encodings with the aim of having them end up as UTF-8 internally. As a start I have the following code:

import std.stdio;
import std.file;

void main(string[] args)
{
    if (args.length == 2)
    {
        if (args[1].exists && args[1].isFile)
        {
            auto f = File(args[1]);
            writeln(args[1]);

            for (auto i = 1; i <= 3; ++i)
                write(f.readln);
        }
    }
}

It works well outputting the file name and first three lines of the file properly, without any regard to the encoding of the file. The exception to this is if the file is UTF-16, with both LE and BE encodings, two characters representing the BOM are printed.

I assume that write detects the encoding of the string returned by readln and prints it correctly rather than readln reading in as a consistent encoding. Is this correct?

Is there a way to remove the BOM from the input buffer and still know the encoding of the file?

Is there a D idiomatic way to do what I want to do?

Mike

Reply via email to