[
https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278673#comment-15278673
]
Ryan Blue commented on HADOOP-13126:
------------------------------------
The results above show the comparison with Snappy. The file is less than half
the size and compression took about the same amount of time. Comparing to LZ4
would be interesting. It isn't supported by Parquet so it's a bit harder for me
to drop into my test case.
> Add Brotli compression codec
> ----------------------------
>
> Key: HADOOP-13126
> URL: https://issues.apache.org/jira/browse/HADOOP-13126
> Project: Hadoop Common
> Issue Type: Improvement
> Components: io
> Reporter: Ryan Blue
> Assignee: Ryan Blue
> Attachments: HADOOP-13126.1.patch
>
>
> I've been testing [Brotli|https://github.com/google/brotli/], a new
> compression library based on LZ77 from Google. Google's [brotli
> benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf]
> look really good and we're also seeing a significant improvement in
> compression size, compression speed, or both.
> {code:title=Brotli preliminary test results}
> [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet
> --compression-codec snappy --overwrite
> real 1m17.106s
> user 1m30.804s
> sys 0m4.404s
> [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet
> --compression-codec brotli --overwrite
> real 1m16.640s
> user 1m24.244s
> sys 0m6.412s
> [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet
> --compression-codec gzip --overwrite
> real 3m39.496s
> user 3m48.736s
> sys 0m3.880s
> [blue@work Downloads]$ ls -l
> -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet
> -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet
> -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet
> {code}
> Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9.
> Another test resulted in a slightly larger Brotli file than gzip produced,
> but Brotli was 4x faster. I'd like to get this compression codec into Hadoop.
> [Brotli is licensed with the MIT
> license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI
> library jbrotli is
> ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]