Files and UTF

Mike Surette via Digitalmars-d-learn Wed, 05 Aug 2020 10:40:47 -0700

In my efforts to learn D I am writing some code to read files indifferent UTF encodings with the aim of having them end up asUTF-8 internally. As a start I have the following code:


import std.stdio;
import std.file;


void main(string[] args)
{
    if (args.length == 2)
    {
        if (args[1].exists && args[1].isFile)
        {
            auto f = File(args[1]);
            writeln(args[1]);

            for (auto i = 1; i <= 3; ++i)
                write(f.readln);
        }
    }
}

It works well outputting the file name and first three lines ofthe file properly, without any regard to the encoding of thefile. The exception to this is if the file is UTF-16, with bothLE and BE encodings, two characters representing the BOM areprinted.

I assume that write detects the encoding of the string returnedby readln and prints it correctly rather than readln reading inas a consistent encoding. Is this correct?

Is there a way to remove the BOM from the input buffer and stillknow the encoding of the file?


Is there a D idiomatic way to do what I want to do?

Mike

Files and UTF

Reply via email to