Re: [I] [EPIC] Make DataFusion the top of the ClickBench Parquet leaderboard [datafusion]

via GitHub Wed, 03 Jun 2026 07:53:22 -0700


alamb commented on issue #18489:
URL: https://github.com/apache/datafusion/issues/18489#issuecomment-4613677853


   > We can do better though by pushing down the row ids to parquet instead 
(a.k.a. late materialization) (compute top k on columns, then only scan top k 
row-ids). It looks like DuckDB also supports this (not sure if it also does it 
for Q23)? https://github.com/duckdb/duckdb/pull/17325
   
   In my mind this is no different than TopK dynamic filtering with predicate 
pushdown.
   
   The idea is that the effective matching row ids that would come out of the 
TopK are created implicitly while evaluating the dynamic predicate
   
   So instead of materializing row ids and then passing those row ids back down 
to the scan, non passing row ids never leave the scan.
   
   At least that is how I think about it
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [EPIC] Make DataFusion the top of the ClickBench Parquet leaderboard [datafusion]

Reply via email to