[
https://issues.apache.org/jira/browse/DRILL-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441088#comment-16441088
]
salim achouche edited comment on DRILL-6301 at 4/17/18 4:12 PM:
----------------------------------------------------------------
[~vrozov]
* Test-1 and Test-2 were not Parquet specific; Test-3 is Parquet specific
* The goal of Test-1 and Test-2 was to assess the efficiency of multiple
memory access layers
** Direct memory through Java Unsafe
** Direct memory through Netty
** Direct memory through DrillBuf
** Hybrid
* In test-1 I used the nextByte() method because of its simplicity
* In test-2, I switched to a real life use-case which is pattern matching
* My goal was to use lightweight transformations to stress the memory access
layer as much as possible
was (Author: sachouche):
Vlad,
* Test-1 and Test-2 were not Parquet specific; Test-3 is Parquet specific
* The goal of Test-1 and Test-2 was to assess the efficiency of multiple
memory access layers
** Direct memory through Java Unsafe
** Direct memory through Netty
** Direct memory through DrillBuf
** Hybrid
* In test-1 I used the nextByte() method because of its simplicity
* In test-2, I switched to a real life use-case which is pattern matching
* My goal was to use lightweight transformations to stress the memory access
layer as much as possible
> Parquet Performance Analysis
> ----------------------------
>
> Key: DRILL-6301
> URL: https://issues.apache.org/jira/browse/DRILL-6301
> Project: Apache Drill
> Issue Type: Task
> Components: Storage - Parquet
> Reporter: salim achouche
> Assignee: salim achouche
> Priority: Major
> Fix For: 1.14.0
>
>
> _*Description -*_
> * DRILL-5846 is meant to improve the Flat Parquet reader performance
> * The associated implementation resulted in a 2x - 4x performance improvement
> * Though during the review process ([pull
> request|[https://github.com/apache/drill/pull/1060])] few key questions arised
>
> *_Intermediary Processing via Direct Memory vs Byte Arrays_*
> * The main reasons for using byte arrays for intermediary processing is to
> a) avoid the high cost of the DrillBuf checks (especially the reference
> counting) and b) benefit from some observed Java optimizations when accessing
> byte arrays
> * Starting with version 1.12.0, the DrillBuf enablement checks have been
> refined so that memory access and reference counting checks can be enabled
> independently
> * Benchmarking of Java's Direct Memory unsafe method using JMH indicates the
> performance gap between heap vs direct memory is very narrow except for few
> use-cases
> * There are also concerns that the extra copy step (from direct memory into
> byte arrays) will have a negative effect on performance; note that this
> overhead was not observed using Intel's Vtune as the intermediary buffer were
> a) pinned to a single CPU, b) reused, and c) small enough to remain in the L1
> cache during columnar processing.
> _*Goal*_
> * The Flat Parquet reader is amongst the few Drill columnar operators
> * It is imperative that we agree on the most optimal processing pattern so
> that the decisions that we take within this Jira are not only applied to
> Parquet but to all Drill columnar operators
> _*Methodology*_
> # Assess the performance impact of using intermediary byte arrays (as
> described above)
> # Prototype a solution using Direct Memory and DrillBuf checks off, access
> checks on, all checks on
> # Make an educated decision on which processing pattern should be adopted
> # Decide whether it is ok to use Java's unsafe API (and through what
> mechanism) on byte arrays (when the use of byte arrays is a necessity)
>
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)