[jira] [Comment Edited] (DRILL-6301) Parquet Performance Analysis

salim achouche (JIRA) Tue, 17 Apr 2018 09:13:59 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441088#comment-16441088
 ]


salim achouche edited comment on DRILL-6301 at 4/17/18 4:12 PM:
----------------------------------------------------------------

[~vrozov]
 * Test-1 and Test-2 were not Parquet specific; Test-3 is Parquet specific
 * The goal of Test-1 and Test-2 was to assess the efficiency of multiple 
memory access layers
 ** Direct memory through Java Unsafe
 ** Direct memory through Netty
 ** Direct memory through DrillBuf
 ** Hybrid
 * In test-1 I used the nextByte() method because of its simplicity 
 * In test-2, I switched to a real life use-case which is pattern matching
 * My goal was to use lightweight transformations to stress the memory access 
layer as much as possible

 


was (Author: sachouche):
Vlad,
 * Test-1 and Test-2 were not Parquet specific; Test-3 is Parquet specific
 * The goal of Test-1 and Test-2 was to assess the efficiency of multiple 
memory access layers
 ** Direct memory through Java Unsafe
 ** Direct memory through Netty
 ** Direct memory through DrillBuf
 ** Hybrid
 * In test-1 I used the nextByte() method because of its simplicity 
 * In test-2, I switched to a real life use-case which is pattern matching
 * My goal was to use lightweight transformations to stress the memory access 
layer as much as possible

 

> Parquet Performance Analysis
> ----------------------------
>
>                 Key: DRILL-6301
>                 URL: https://issues.apache.org/jira/browse/DRILL-6301
>             Project: Apache Drill
>          Issue Type: Task
>          Components: Storage - Parquet
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Major
>             Fix For: 1.14.0
>
>
> _*Description -*_
>  * DRILL-5846 is meant to improve the Flat Parquet reader performance
>  * The associated implementation resulted in a 2x - 4x performance improvement
>  * Though during the review process ([pull 
> request|[https://github.com/apache/drill/pull/1060])] few key questions arised
>  
> *_Intermediary Processing via Direct Memory vs Byte Arrays_*
>  * The main reasons for using byte arrays for intermediary processing is to 
> a) avoid the high cost of the DrillBuf checks (especially the reference 
> counting) and b) benefit from some observed Java optimizations when accessing 
> byte arrays
>  * Starting with version 1.12.0, the DrillBuf enablement checks have been 
> refined so that memory access and reference counting checks can be enabled 
> independently
>  * Benchmarking of Java's Direct Memory unsafe method using JMH indicates the 
> performance gap between heap vs direct memory  is very narrow except for few 
> use-cases
>  * There are also concerns that the extra copy step (from direct memory into 
> byte arrays) will have a negative effect on performance; note that this 
> overhead was not observed using Intel's Vtune as the intermediary buffer were 
> a) pinned to a single CPU, b) reused, and c) small enough to remain in the L1 
> cache during columnar processing.
> _*Goal*_ 
>  * The Flat Parquet reader is amongst the few Drill columnar operators
>  * It is imperative that we agree on the most optimal processing pattern so 
> that the decisions that we take within this Jira are not only applied to 
> Parquet but to all Drill columnar operators   
> _*Methodology*_ 
>  # Assess the performance impact of using intermediary byte arrays (as 
> described above)
>  # Prototype a solution using Direct Memory and DrillBuf checks off, access 
> checks on, all checks on
>  # Make an educated decision on which processing pattern should be adopted
>  # Decide whether it is ok to use Java's unsafe API (and through what 
> mechanism) on byte arrays (when the use of byte arrays is a necessity)
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (DRILL-6301) Parquet Performance Analysis

Reply via email to