On Thursday, 4 January 2018 at 12:15:27 UTC, Steven Schveighoffer wrote:
On 1/4/18 7:01 AM, Andrew wrote:

Ah thank you, that makes sense. These types of files are compressed using the bgzip utility so that the file can be indexed meaning specific rows extracted quickly (there's more details of this here http://www.htslib.org/doc/tabix.html and the code can be found here: https://github.com/samtools/htslib/blob/develop/bgzf.c)

Hm... that utility seems to say it will result in bgz file extension? So this must be an extraction from one of those files?

In any case, I'll figure out how to deal with concatenated gzip file, and update iopipe. Next version will focus on a bunch of stuff relating to the 2 zip threads recently posted here.

Thanks!

-Steve

That would be really great for me, thank you! By default bgzip produces a file with the standard .gz extension. Looking at the code it adds an extra field to the standard gzip header:

/* BGZF/GZIP header (speciallized from RFC 1952; little endian):
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | 31|139| 8| 4| 0| 0|255| 6| 66| 67| 2|BLK_LEN| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
  BGZF extension:
                ^                              ^   ^   ^
                |                              |   |   |
               FLG.EXTRA                     XLEN  B   C
BGZF format is compatible with GZIP. It limits the size of each compressed block to 2^16 bytes and adds and an extra "BC" field in the gzip header which
  records the size.
*/

Thanks again!

Andrew

Reply via email to