[ 
https://issues.apache.org/jira/browse/PARQUET-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147302#comment-17147302
 ] 

ASF GitHub Bot commented on PARQUET-1643:
-----------------------------------------

samarthjain commented on pull request #671:
URL: https://github.com/apache/parquet-mr/pull/671#issuecomment-650731696


   Force pushed a new commit that makes it configurable whether to use Airlift 
based compressors or not. Also added tests and GZIP benchmarks for Airlift 
compressors. Benchmark results reveal that there are no performance 
improvements or regressions when using Airlift GZIP vs plain GZIP. 
   ```
   PageChecksumReadBenchmarks.read10MRowsAirliftGzipWithVerification            
        3     6.431 ±    0.741
   PageChecksumReadBenchmarks.read10MRowsAirliftGzipWithoutVerification         
        3     6.605 ±    0.709
   PageChecksumReadBenchmarks.read10MRowsGzipWithVerification                   
        3     6.468 ±    0.700
   PageChecksumReadBenchmarks.read10MRowsGzipWithoutVerification                
        3     6.583 ±    1.538
   
   PageChecksumWriteBenchmarks.write10MRowsAirliftGzipWithChecksums             
        3    36.333 ±    0.510
   PageChecksumWriteBenchmarks.write10MRowsAirliftGzipWithoutChecksums          
        3    36.069 ±    1.096
   PageChecksumWriteBenchmarks.write10MRowsGzipWithChecksums                    
        3    36.141 ±    1.095
   PageChecksumWriteBenchmarks.write10MRowsGzipWithoutChecksums                 
        3    36.174 ±    5.125
   
   
   ReadBenchmarks.read1MRowsDefaultBlockAndPageSizeAirliftGZIP                  
        3     0.898 ±    1.254
   ReadBenchmarks.read1MRowsDefaultBlockAndPageSizeGZIP                         
        3     0.891 ±    1.201
   ```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Use airlift non-native implementations for GZIP, LZ0 and LZ4 codecs
> -------------------------------------------------------------------
>
>                 Key: PARQUET-1643
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1643
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Samarth Jain
>            Assignee: Samarth Jain
>            Priority: Major
>              Labels: pull-request-available
>
> [~rdblue] pointed me to [https://github.com/airlift/aircompressor] which 
> provides non-native implementations of compression codecs. It claims to be 
> much faster than native wrappers that parquet uses. This Jira is to track the 
> work needed for exploring using these codecs, getting benchmark results and 
> making changes including not needing to pool compressors and decompressors 
> anymore. Note that this doesn't include SNAPPY since Parquet already has its 
> own non-hadoopy implementation for it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to