On Thursday, 4 January 2018 at 02:44:09 UTC, Steven Schveighoffer wrote:
On 1/3/18 12:03 PM, Andrew wrote:

Thanks for looking into this.


So it looks like the file you have is a concatenated gzip file. If I gunzip the file and recompress it, it works properly.

Looking at the docs of zlib inflate [1]:

" Unlike the gunzip utility and gzread() ..., inflate() will not automatically decode concatenated gzip streams. inflate() will return Z_STREAM_END at the end of the gzip stream. The state would need to be reset to continue decoding a subsequent gzip stream."

So what is happening is the inflate function is returning Z_STREAM_END, and I'm considering the stream done from that return code.

I'm not sure yet how to fix this. I suppose I can check if any more data exists, and then re-init and continue. I have to look up what a concatenated gzip file is. gzread isn't good for generic purposes, because it requires an actual file input (I want to support any input type, including memory data).

-Steve

[1] https://github.com/dlang/phobos/blob/master/etc/c/zlib.d#L874

Ah thank you, that makes sense. These types of files are compressed using the bgzip utility so that the file can be indexed meaning specific rows extracted quickly (there's more details of this here http://www.htslib.org/doc/tabix.html and the code can be found here: https://github.com/samtools/htslib/blob/develop/bgzf.c)

Reply via email to