[
https://issues.apache.org/jira/browse/PARQUET-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516909#comment-17516909
]
Timothy Miller commented on PARQUET-2135:
-----------------------------------------
Extra note:
The reason PlainValuesReader still includes an unused
LittleEndianDataInputStream member is because if I don't, the build will fail,
indicating an incompatible API change.
> Performance optimizations: Merged all LittleEndianDataInputStream
> functionality into ByteBufferInputStream
> ----------------------------------------------------------------------------------------------------------
>
> Key: PARQUET-2135
> URL: https://issues.apache.org/jira/browse/PARQUET-2135
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Affects Versions: 1.12.2
> Reporter: Timothy Miller
> Priority: Major
>
> This PR is all performance optimization. In benchmarking with Trino, we find
> query performance to improve from 5% to 15%, depending on the query, and that
> includes all the I/O time from S3.
> The main modification is to merge all of LittleEndianDataInputStream
> functionality into ByteBufferInputStream, which yields the following benefits:
> * Elimination of extra layers of abstraction and method call overhead
> * Enable the use of intrinsics for readInt, readLong, etc.
> * Availability of faster access methods like readFully and skipFully,
> without the need for helper functions
> * Reduces some object creation in the performance critical path
> This also includes and enables performance optimizations to:
> * ByteBitPackingValuesReader
> * PlainValuesReader
> * RunLengthBitPackingHybridDecoder
> Context:
> I've been working on improving Parquet reading performance in Trino, mostly
> by profiling while running performance benchmarks and TPCDS queries. This PR
> is a subset of the changes I made that have more than doubled the performance
> of a lot of TPCDS queries (wall clock time, including the S3 access time). If
> you are kind enough to accept these changes, I have more I would like to
> contribute.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)