On Thursday, 4 January 2018 at 12:15:27 UTC, Steven Schveighoffer
wrote:
On 1/4/18 7:01 AM, Andrew wrote:
Ah thank you, that makes sense. These types of files are
compressed using the bgzip utility so that the file can be
indexed meaning specific rows extracted quickly (there's more
details of this here http://www.htslib.org/doc/tabix.html and
the code can be found here:
https://github.com/samtools/htslib/blob/develop/bgzf.c)
Hm... that utility seems to say it will result in bgz file
extension? So this must be an extraction from one of those
files?
In any case, I'll figure out how to deal with concatenated gzip
file, and update iopipe. Next version will focus on a bunch of
stuff relating to the 2 zip threads recently posted here.
Thanks!
-Steve
That would be really great for me, thank you! By default bgzip
produces a file with the standard .gz extension. Looking at the
code it adds an extra field to the standard gzip header:
/* BGZF/GZIP header (speciallized from RFC 1952; little endian):
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 31|139| 8| 4| 0| 0|255| 6| 66| 67|
2|BLK_LEN|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
BGZF extension:
^ ^ ^ ^
| | | |
FLG.EXTRA XLEN B C
BGZF format is compatible with GZIP. It limits the size of each
compressed
block to 2^16 bytes and adds and an extra "BC" field in the
gzip header which
records the size.
*/
Thanks again!
Andrew