mapleFU commented on issue #8643: URL: https://github.com/apache/arrow-rs/issues/8643#issuecomment-3474159198
As @etseidl tells me, the code below ( which is under parquet and not under arrow module ) would read the required row-group. https://github.com/apache/arrow-rs/blob/bac0cb57af36f0c025696db146eccea8f3f469cb/parquet/src/file/serialized_reader.rs#L200-L216 > What I think would help is APIs for progressively reading / populating the metadata (e.g. initially only read 5 columns, but then be able to incrementally parse / produce the remaining columns after) -- this maybe is APIs on ParquetMetaData to add new columns, / row groups I think there would be a interface or stage between load footer statics ( Like the code above, perhaps between `DecodeState::ReadingPageIndex` and `DecodeState::Finished`). Maybe add new statistics here is a good way for page index. Besides, this might work like `ArrowPredicate` in arrow module: statistics/column index needs the `ArrowPredicate::projection` columns, and other required columns should read all offset index. I'm new here and I don't know would ArrowPredicate also be used here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
