alamb commented on issue #15177: URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2717443917
OOOO -- here is the duckdb plan and it shows what they are doing! The key is this line: ``` │ Filters: │ │ optional: Dynamic Filter │ │ (EventTime) │ ``` What I think this is referring to is what @adriangb is describing in : - https://github.com/apache/datafusion/issues/15037 Specifically, the Top_N operator passes down a filter into the scan. The filter is "dynamic" in the sense that 1. the TOP_N operator knows what the smallest maximum value currently is 2. That means the scan can filter rows where the current timestamp is less than that number ``` ┌─────────────────────────────┐ │┌───────────────────────────┐│ ││ Physical Plan ││ │└───────────────────────────┘│ └─────────────────────────────┘ ┌───────────────────────────┐ │ TOP_N │ │ ──────────────────── │ │ Top: 10 │ │ │ │ Order By: │ │ memory.main.hits.EventTime│ │ ASC │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ FILTER │ │ ──────────────────── │ │ contains(URL, 'google') │ │ │ │ ~20000000 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PARQUET_SCAN │ │ ──────────────────── │ │ Function: │ │ PARQUET_SCAN │ │ │ │ Projections: │ │ WatchID │ │ JavaEnable │ │ Title │ │ GoodEvent │ │ EventTime │ │ EventDate │ │ CounterID │ │ ClientIP │ │ RegionID │ │ UserID │ │ CounterClass │ │ OS │ │ UserAgent │ │ URL │ │ ... │ │ ParamCurrencyID │ │ OpenstatServiceName │ │ OpenstatCampaignID │ │ OpenstatAdID │ │ OpenstatSourceID │ │ UTMSource │ │ UTMMedium │ │ UTMCampaign │ │ UTMContent │ │ UTMTerm │ │ FromTag │ │ HasGCLID │ │ RefererHash │ │ URLHash │ │ CLID │ │ │ │ Filters: │ │ optional: Dynamic Filter │ │ (EventTime) │ │ │ │ ~100000000 Rows │ └───────────────────────────┘ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org