[ https://issues.apache.org/jira/browse/PARQUET-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated PARQUET-2323: ------------------------------------ Labels: pull-request-available (was: ) > Use bit vector to store Prebuffered column chunk index > ------------------------------------------------------ > > Key: PARQUET-2323 > URL: https://issues.apache.org/jira/browse/PARQUET-2323 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp > Reporter: Jinpeng Zhou > Priority: Minor > Labels: pull-request-available > Fix For: cpp-13.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > In https://issues.apache.org/jira/browse/PARQUET-2316 we allow partial buffer > in parquet File Reader by storing prebuffered column chunk index in a hash > set, and make a copy of this hash set for each rowgroup reader > In extreme conditions where numerous columns are prebuffered and multiple > rowgroup readers are created for the same row group , the hash set would > incur significant overhead. > Using bit vector would be a reasonsable mitigation, taking 4KB for 32K > columns. -- This message was sent by Atlassian Jira (v8.20.10#820010)