Hi, 2011/1/31 Sean Bigdatafun <sean.bigdata...@gmail.com>: > GZIP is not splittable.
Correct, gzip is a stream compression system which effectively means you can only start at the beginning of the data with decompressing. > Does that mean a GZIP block compressed sequencefile can't take advantage of > MR parallelism? AFAIK it should be splittable in the same blocks as the compression was done. > How to control the size of block to be compressed in SequenceFile? Can't help you with that one. -- Met vriendelijke groeten, Niels Basjes