[
https://issues.apache.org/jira/browse/DRILL-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
salim achouche resolved DRILL-6301.
-----------------------------------
Resolution: Fixed
Reviewer: Pritesh Maker
This is an analytical task.
> Parquet Performance Analysis
> ----------------------------
>
> Key: DRILL-6301
> URL: https://issues.apache.org/jira/browse/DRILL-6301
> Project: Apache Drill
> Issue Type: Task
> Components: Storage - Parquet
> Reporter: salim achouche
> Assignee: salim achouche
> Priority: Major
> Fix For: 1.14.0
>
>
> _*Description -*_
> * DRILL-5846 is meant to improve the Flat Parquet reader performance
> * The associated implementation resulted in a 2x - 4x performance improvement
> * Though during the review process ([pull
> request|[https://github.com/apache/drill/pull/1060])] few key questions arised
>
> *_Intermediary Processing via Direct Memory vs Byte Arrays_*
> * The main reasons for using byte arrays for intermediary processing is to
> a) avoid the high cost of the DrillBuf checks (especially the reference
> counting) and b) benefit from some observed Java optimizations when accessing
> byte arrays
> * Starting with version 1.12.0, the DrillBuf enablement checks have been
> refined so that memory access and reference counting checks can be enabled
> independently
> * Benchmarking of Java's Direct Memory unsafe method using JMH indicates the
> performance gap between heap vs direct memory is very narrow except for few
> use-cases
> * There are also concerns that the extra copy step (from direct memory into
> byte arrays) will have a negative effect on performance; note that this
> overhead was not observed using Intel's Vtune as the intermediary buffer were
> a) pinned to a single CPU, b) reused, and c) small enough to remain in the L1
> cache during columnar processing.
> _*Goal*_
> * The Flat Parquet reader is amongst the few Drill columnar operators
> * It is imperative that we agree on the most optimal processing pattern so
> that the decisions that we take within this Jira are not only applied to
> Parquet but to all Drill columnar operators
> _*Methodology*_
> # Assess the performance impact of using intermediary byte arrays (as
> described above)
> # Prototype a solution using Direct Memory and DrillBuf checks off, access
> checks on, all checks on
> # Make an educated decision on which processing pattern should be adopted
> # Decide whether it is ok to use Java's unsafe API (and through what
> mechanism) on byte arrays (when the use of byte arrays is a necessity)
>
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)