Repository: parquet-format Updated Branches: refs/heads/master ddc18a7af -> 84460c5a1
PARQUET-1124: Add LZ4 and Zstd compression codecs. This adds LZ4 and Zstd compression codecs to the format spec. From recent tests, Zstd appears to out-perform other codecs (including brotli on reads). LZ4 is widely available because it is built into Hadoop, making it a good successor to snappy, for fast compression and decompression when speed is mroe important than compression ratio. Author: Ryan Blue <[email protected]> Closes #70 from rdblue/PARQUET-1124-add-compression-codecs and squashes the following commits: 939328e [Ryan Blue] PARQUET-1124: Add warning about external codec dependencies. affad3d [Ryan Blue] PARQUET-1124: Add lz4 and zstd compression codecs. Project: http://git-wip-us.apache.org/repos/asf/parquet-format/repo Commit: http://git-wip-us.apache.org/repos/asf/parquet-format/commit/84460c5a Tree: http://git-wip-us.apache.org/repos/asf/parquet-format/tree/84460c5a Diff: http://git-wip-us.apache.org/repos/asf/parquet-format/diff/84460c5a Branch: refs/heads/master Commit: 84460c5a1e8aadf52a40dcf2aeb2fc875df4ac2a Parents: ddc18a7 Author: Ryan Blue <[email protected]> Authored: Tue Oct 10 12:55:27 2017 -0700 Committer: Ryan Blue <[email protected]> Committed: Tue Oct 10 12:55:27 2017 -0700 ---------------------------------------------------------------------- src/main/thrift/parquet.thrift | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/parquet-format/blob/84460c5a/src/main/thrift/parquet.thrift ---------------------------------------------------------------------- diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift index a4e193e..38cddc7 100644 --- a/src/main/thrift/parquet.thrift +++ b/src/main/thrift/parquet.thrift @@ -451,13 +451,20 @@ enum Encoding { /** * Supported compression algorithms. + * + * Codecs added in 2.3.2 can be read by readers based on 2.3.2 and later. + * Codec support may vary between readers based on the format version and + * libraries available at runtime. Gzip, Snappy, and LZ4 codecs are + * widely available, while Zstd and Brotli require additional libraries. */ enum CompressionCodec { UNCOMPRESSED = 0; SNAPPY = 1; GZIP = 2; LZO = 3; - BROTLI = 4; + BROTLI = 4; // Added in 2.3.2 + LZ4 = 5; // Added in 2.3.2 + ZSTD = 6; // Added in 2.3.2 } enum PageType {
