sdf-jkl commented on PR #20497: URL: https://github.com/apache/datafusion/pull/20497#issuecomment-4061269624
Sorry, I think I got things mixed up while working on this. We consider a column `sorted` by checking `page_index` ordering (`min/max`) for that column across pages in each row group. If those pages are ordered, we treat that column as sorted. Given that, this column is usually a strong candidate for row group/page pruning. So we prune. After pruning, the remaining work goes to `row_filter`. For a range predicate on a sorted column, `row_filter` is then likely to trim mostly at kept-window boundaries (often a relatively small contiguous region, though it can still include full page(s) once we use the selection on heavier columns) This should make the incremental benefit of using a predicate on this column early in Late Materialization likely marginal in many workloads, given most of the pruning value was already captured earlier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
