Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

via GitHub Fri, 06 Mar 2026 10:20:10 -0800


alamb commented on PR #20481:
URL: https://github.com/apache/datafusion/pull/20481#issuecomment-4013276306


   
   @Dandandan 
   
   > But there are also some with only a few row groups - so there is still 
potential for further parallelization for smaller scans and speeding up the 
final part of the execution.
   
   > first split the query into row group morsels
   > when the remaining queue is small, further split the morsels into smaller 
ones
   
   I actually think we could do this fairly efficiently (with relatively low 
ovehead) using the `RowSelection` API -- aka fetch the data for a single Row 
Group but then potentially break that row group up into several smaller morsels 
(e.g. 10-20 batches worth of rows) 🤔  
   
   > I think we have to split such changes into a couple of PRs so we can 
handle each change individually.
   
   100% agree 
   
   I also think figuring out how filter pushdown fits in will be important


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

Reply via email to