[GitHub] [iceberg] bryanck opened a new pull request, #5137: Arrow: Avoid extra dictionary buffer copy

GitBox Mon, 27 Jun 2022 04:35:33 -0700


bryanck opened a new pull request, #5137:
URL: https://github.com/apache/iceberg/pull/5137


   This PR changes the dictionary value accessors in the vectorized parquet 
reader so that the dictionary values are read from the underlying dictionary 
directly, rather than copying the values into a new buffer (this was already 
being done in the dictionary decimal accessor classes). The underlying parquet 
dictionary classes already load the values into a buffer, so copying them to a 
new buffer appears redundant.
   
   In very limited testing, this shows a performance gain of over 20% in 
vectorized read performance in some scenarios, though more testing would be 
required to get more accurate metrics.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] bryanck opened a new pull request, #5137: Arrow: Avoid extra dictionary buffer copy

Reply via email to