[
https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295142#comment-16295142
]
Lee Blum commented on HADOOP-13126:
-----------------------------------
[~rdblue] joining as well with our interest in this feature. Can we expect in
what Parquet version will it be available? Brotli demonstrates supreme
compression rates with low CPU consumption, and that can benefit a lot of
users. We know that it will benefit our use case as well.
> Add Brotli compression codec
> ----------------------------
>
> Key: HADOOP-13126
> URL: https://issues.apache.org/jira/browse/HADOOP-13126
> Project: Hadoop Common
> Issue Type: Improvement
> Components: io
> Affects Versions: 2.7.2
> Reporter: Ryan Blue
> Assignee: Ryan Blue
> Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch,
> HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch
>
>
> I've been testing [Brotli|https://github.com/google/brotli/], a new
> compression library based on LZ77 from Google. Google's [brotli
> benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf]
> look really good and we're also seeing a significant improvement in
> compression size, compression speed, or both.
> {code:title=Brotli preliminary test results}
> [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet
> --compression-codec snappy --overwrite
> real 1m17.106s
> user 1m30.804s
> sys 0m4.404s
> [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet
> --compression-codec brotli --overwrite
> real 1m16.640s
> user 1m24.244s
> sys 0m6.412s
> [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet
> --compression-codec gzip --overwrite
> real 3m39.496s
> user 3m48.736s
> sys 0m3.880s
> [blue@work Downloads]$ ls -l
> -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet
> -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet
> -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet
> {code}
> Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9.
> Another test resulted in a slightly larger Brotli file than gzip produced,
> but Brotli was 4x faster. I'd like to get this compression codec into Hadoop.
> [Brotli is licensed with the MIT
> license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI
> library jbrotli is
> ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE].
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]