adriangb commented on issue #21399:
URL: https://github.com/apache/datafusion/issues/21399#issuecomment-4195986905

   > The new arrow-rs APIs (peek/skip) are for a different purpose: dynamically 
skipping row groups during execution based on the TopK threshold. The access 
plan is fixed before the decoder starts — once it begins reading, it processes 
all selected RGs in order. But after reading the first RG, TopK sets a tight 
threshold (e.g., id > 999991), and the remaining 19 RGs can be skipped 
entirely. Without peek/skip, the decoder has no way to stop mid-file.
   
   I think this will be sorted out by the morsels API: we can do additional 
data skipping before each morsel (still some overhead but nowhere near as much 
as reading the row group)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to