alamb commented on issue #18489: URL: https://github.com/apache/datafusion/issues/18489#issuecomment-4613677853
> We can do better though by pushing down the row ids to parquet instead (a.k.a. late materialization) (compute top k on columns, then only scan top k row-ids). It looks like DuckDB also supports this (not sure if it also does it for Q23)? https://github.com/duckdb/duckdb/pull/17325 In my mind this is no different than TopK dynamic filtering with predicate pushdown. The idea is that the effective matching row ids that would come out of the TopK are created implicitly while evaluating the dynamic predicate So instead of materializing row ids and then passing those row ids back down to the scan, non passing row ids never leave the scan. At least that is how I think about it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
