adriangb commented on issue #21399: URL: https://github.com/apache/datafusion/issues/21399#issuecomment-4195986905
> The new arrow-rs APIs (peek/skip) are for a different purpose: dynamically skipping row groups during execution based on the TopK threshold. The access plan is fixed before the decoder starts — once it begins reading, it processes all selected RGs in order. But after reading the first RG, TopK sets a tight threshold (e.g., id > 999991), and the remaining 19 RGs can be skipped entirely. Without peek/skip, the decoder has no way to stop mid-file. I think this will be sorted out by the morsels API: we can do additional data skipping before each morsel (still some overhead but nowhere near as much as reading the row group) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
