lzop compatible CompressionCodec -------------------------------- Key: HADOOP-2424 URL: https://issues.apache.org/jira/browse/HADOOP-2424 Project: Hadoop Issue Type: Improvement Components: io, native Reporter: Chris Douglas
LzoCodec currently outputs at most {{io.compression.codec.lzo.buffersize}} (default 64k)- less the compression overhead- bytes per write (HADOOP-2402) in the following format: {noformat} [compressed block length(32)] [compressed block] {noformat} lzop (lzo-backed command-line utility) writes blocks in the following format: {noformat} [uncompressed block length(32)] [compressed block length (32)] [Adler-32|CRC-32 checksum of uncompressed block (32)] [Adler-32|CRC-32 checksum of compressed block (32)] [compressed block] {noformat} There's an additional ~32 byte header to the file. I don't know of a standard, but the lzop source should suffice. Since we're using ".lzo" as the default extension, it's worth considering being compatible with lzop, but not necessarily for all lzo-compressed blocks. For example, SequenceFiles should use the existing LzoCodec format. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.