jp0317 opened a new pull request, #36649:
URL: https://github.com/apache/arrow/pull/36649

   
   ### Rationale for this change
   
   In https://issues.apache.org/jira/browse/PARQUET-2316 we allow partial 
buffer in parquet File Reader by storing prebuffered column chunk index in a 
hash set, and make a copy of this hash set for each rowgroup reader
   
   In extreme conditions where numerous columns are prebuffered and multiple 
rowgroup readers are created for the same row group , the hash set would incur 
significant overhead. 
   
   Using bit vector would be a reasonsable mitigation, taking 4KB for 32K 
columns.
   
   ### What changes are included in this PR?
   
   swtiching from hash set to bool vector
   
   ### Are these changes tested?
   
   yes, passed unit tests on partial prebuffer
   
   ### Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to