Re: Parsing a UTF-16LE file line by line, BUG?

Nestor via Digitalmars-d-learn Tue, 17 Jan 2017 03:46:05 -0800

On Monday, 16 January 2017 at 14:47:23 UTC, Era Scarecrow wrote:

On Sunday, 15 January 2017 at 19:48:04 UTC, Nestor wrote:
I see. So correcting my original doubt:
How could I parse an UTF16LE file line by line (producing aproper string in each iteration) without loading the entirefile into memory?
Could... roll your own? Although if you wanted it to be UTF-8output instead would require a second pass or better yetchanging how the i iterated.
char[] getLine16LE(File inp = stdin) {
static char[1024*4] buffer; //4k reusable buffer, NOTthread safe
    int i;
    while(inp.rawRead(buffer[i .. i+2]) != null) {
        if (buffer[i] == '\n')
            break;

        i+=2;
    }

    return buffer[0 .. i];
}

Thanks, but unfortunately this function does not produce properUTF8 strings, as a matter of fact the output even starts with theBOM. Also it doen't handle CRLF, and even for LF terminated linesit doesn't seem to work for lines other than the first.

I guess I have to code encoding detection, buffered read, andtranscoding by hand, the only problem is that the result could besub-optimal, which is why I was looking for a built-in solution.

Re: Parsing a UTF-16LE file line by line, BUG?

Reply via email to