[GitHub] [arrow] jp0317 opened a new pull request, #36649: PARQUET-2323: [C++] use bit vector to store prebuffered column chunks

via GitHub Wed, 12 Jul 2023 11:18:45 -0700


jp0317 opened a new pull request, #36649:
URL: https://github.com/apache/arrow/pull/36649


   
   ### Rationale for this change
   
   In https://issues.apache.org/jira/browse/PARQUET-2316 we allow partial 
buffer in parquet File Reader by storing prebuffered column chunk index in a 
hash set, and make a copy of this hash set for each rowgroup reader
   
   In extreme conditions where numerous columns are prebuffered and multiple 
rowgroup readers are created for the same row group , the hash set would incur 
significant overhead. 
   
   Using bit vector would be a reasonsable mitigation, taking 4KB for 32K 
columns.
   
   ### What changes are included in this PR?
   
   swtiching from hash set to bool vector
   
   ### Are these changes tested?
   
   yes, passed unit tests on partial prebuffer
   
   ### Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jp0317 opened a new pull request, #36649: PARQUET-2323: [C++] use bit vector to store prebuffered column chunks

Reply via email to