[ 
https://issues.apache.org/jira/browse/HADOOP-13126?focusedWorklogId=557959&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-557959
 ]

ASF GitHub Bot logged work on HADOOP-13126:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Feb/21 15:02
            Start Date: 25/Feb/21 15:02
    Worklog Time Spent: 10m 
      Work Description: martin-g opened a new pull request #2723:
URL: https://github.com/apache/hadoop/pull/2723


   Adds BrotliCodec - a compression codec based on [Google 
Brotli](https://github.com/google/brotli)
   
   This PR is a continuation on the work done by @rdblue at 
https://issues.apache.org/jira/browse/HADOOP-13126
   In his patches it was based on 
[jbrotli](https://github.com/MeteoGroup/jbrotli) library but this library is 
not maintained since few years. My PR uses 
[Brotli4j](https://github.com/hyperxpro/Brotli4j)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 557959)
    Remaining Estimate: 0h
            Time Spent: 10m

> Add Brotli compression codec
> ----------------------------
>
>                 Key: HADOOP-13126
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13126
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 2.7.2
>            Reporter: Ryan Blue
>            Assignee: Ryan Blue
>            Priority: Major
>         Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, 
> HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've been testing [Brotli|https://github.com/google/brotli/], a new 
> compression library based on LZ77 from Google. Google's [brotli 
> benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf]
>  look really good and we're also seeing a significant improvement in 
> compression size, compression speed, or both.
> {code:title=Brotli preliminary test results}
> [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet 
> --compression-codec snappy --overwrite                      
> real    1m17.106s
> user    1m30.804s
> sys     0m4.404s
> [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet 
> --compression-codec brotli --overwrite                         
> real    1m16.640s
> user    1m24.244s
> sys     0m6.412s
> [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet 
> --compression-codec gzip --overwrite                            
> real    3m39.496s
> user    3m48.736s
> sys     0m3.880s
> [blue@work Downloads]$ ls -l
> -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet
> -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet
> -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet
> {code}
> Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. 
> Another test resulted in a slightly larger Brotli file than gzip produced, 
> but Brotli was 4x faster. I'd like to get this compression codec into Hadoop.
> [Brotli is licensed with the MIT 
> license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI 
> library jbrotli is 
> ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to