alamb commented on PR #10738:
URL: https://github.com/apache/datafusion/pull/10738#issuecomment-2163999195

   > @alamb is there any documentation on what it means for DataFusion to 
"scan" specific rows within a row group? Does it actually read only those rows? 
I'd imagine that because of some mix of compression and limitations of byte 
range fetches to contiguous bytes for object stores you end up streaming entire 
row groups anyway.
   
   Specifically, DataFusion  uses this API: 
https://github.com/apache/arrow-rs/blob/0cc14168000e1e41fc5f63929d34d13dda6e5873/parquet/src/arrow/arrow_reader/mod.rs#L137-L194
   
   Which if you have the PageIndex (which is written by default in the parquet 
rs writer) the reader may be able to skip certain pages


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to