On Monday, 16 January 2017 at 14:47:23 UTC, Era Scarecrow wrote:
On Sunday, 15 January 2017 at 19:48:04 UTC, Nestor wrote:
I see. So correcting my original doubt:

How could I parse an UTF16LE file line by line (producing a proper string in each iteration) without loading the entire file into memory?

Could... roll your own? Although if you wanted it to be UTF-8 output instead would require a second pass or better yet changing how the i iterated.

char[] getLine16LE(File inp = stdin) {
static char[1024*4] buffer; //4k reusable buffer, NOT thread safe
    int i;
    while(inp.rawRead(buffer[i .. i+2]) != null) {
        if (buffer[i] == '\n')
            break;

        i+=2;
    }

    return buffer[0 .. i];
}

Thanks, but unfortunately this function does not produce proper UTF8 strings, as a matter of fact the output even starts with the BOM. Also it doen't handle CRLF, and even for LF terminated lines it doesn't seem to work for lines other than the first.

I guess I have to code encoding detection, buffered read, and transcoding by hand, the only problem is that the result could be sub-optimal, which is why I was looking for a built-in solution.

Reply via email to