mapleFU commented on issue #8643:
URL: https://github.com/apache/arrow-rs/issues/8643#issuecomment-3474159198

   As @etseidl tells me, the code below ( which is under parquet and not under 
arrow module ) would read the required row-group.
   
   
https://github.com/apache/arrow-rs/blob/bac0cb57af36f0c025696db146eccea8f3f469cb/parquet/src/file/serialized_reader.rs#L200-L216
   
   > What I think would help is APIs for progressively reading / populating the 
metadata (e.g. initially only read 5 columns, but then be able to incrementally 
parse / produce the remaining columns after) -- this maybe is APIs on 
ParquetMetaData to add new columns, / row groups
   
   I think there would be a interface or stage between load footer statics ( 
Like the code above, perhaps between `DecodeState::ReadingPageIndex` and 
`DecodeState::Finished`). Maybe add new statistics here is a good way for page 
index.
   
   Besides, this might work like `ArrowPredicate` in arrow module: 
statistics/column index needs the `ArrowPredicate::projection` columns, and 
other required columns should read all offset index. I'm new here and I don't 
know would ArrowPredicate also be used here?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to