alamb commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-4013276306
@Dandandan > But there are also some with only a few row groups - so there is still potential for further parallelization for smaller scans and speeding up the final part of the execution. > first split the query into row group morsels > when the remaining queue is small, further split the morsels into smaller ones I actually think we could do this fairly efficiently (with relatively low ovehead) using the `RowSelection` API -- aka fetch the data for a single Row Group but then potentially break that row group up into several smaller morsels (e.g. 10-20 batches worth of rows) 🤔 > I think we have to split such changes into a couple of PRs so we can handle each change individually. 100% agree I also think figuring out how filter pushdown fits in will be important -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
