On Monday, 15 March 2021 at 01:36:08 UTC, sharkloc wrote:
I want to read the content(file.gz) line by line,the following code is not friendly to large files of hundreds of Gb, and the memory overhead is also very large.

You can use the internal zlib instead of a shell. This example is using stdin but you can it also replace with a file handle:

import std.zlib;
import std.stdio;
import std.conv : to;
import std.array : split;
import std.algorithm.iteration : map;

void main() {

    UnCompress decmp = new UnCompress;
    string buf;

    // read 4096 bytes of compressed stream at iteration
foreach (chunk; stdin.byChunk(4096).map!(x => decmp.uncompress(x))) {

        // chunk has unknown length of decompressed data
        auto lines = to!string(chunk).split("\n");

        foreach (i, line; lines[0 .. $]) {
            if (i == 0) {
                // if there is something in buffer
                // it belongs to previos line
                writeln(buf ~ line);

                // reset buffer
                buf.length = 0;

            }
            else if (i + 1 == lines.length) {
                // the last line is maybe incomplete, we never
                // directly output it
                buf = line;

            }
            else {
                writeln(line);
            }
        }
    }

    // rest
    if (buf.length) {
        write(buf);
    }
}


Reply via email to