alamb commented on issue #15177:
URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2717443917

   OOOO -- here is the duckdb plan and it shows what they are doing! 
   
   The key is this line:
   ```
   │          Filters:         │
   │ optional: Dynamic Filter  │
   │        (EventTime)        │
   ```
   
   What I think this is referring to is what @adriangb is describing in :
   - https://github.com/apache/datafusion/issues/15037
   
   Specifically, the Top_N operator passes down a filter into the scan. The 
filter is "dynamic" in the sense that
   1.  the TOP_N operator knows what the smallest maximum value currently is
   2. That means the scan can filter rows where the current timestamp is less 
than that number
   
   
   
   ```
   ┌─────────────────────────────┐
   │┌───────────────────────────┐│
   ││       Physical Plan       ││
   │└───────────────────────────┘│
   └─────────────────────────────┘
   ┌───────────────────────────┐
   │           TOP_N           │
   │    ────────────────────   │
   │          Top: 10          │
   │                           │
   │         Order By:         │
   │ memory.main.hits.EventTime│
   │             ASC           │
   └─────────────┬─────────────┘
   ┌─────────────┴─────────────┐
   │           FILTER          │
   │    ────────────────────   │
   │  contains(URL, 'google')  │
   │                           │
   │       ~20000000 Rows      │
   └─────────────┬─────────────┘
   ┌─────────────┴─────────────┐
   │       PARQUET_SCAN        │
   │    ────────────────────   │
   │         Function:         │
   │        PARQUET_SCAN       │
   │                           │
   │        Projections:       │
   │          WatchID          │
   │         JavaEnable        │
   │           Title           │
   │         GoodEvent         │
   │         EventTime         │
   │         EventDate         │
   │         CounterID         │
   │          ClientIP         │
   │          RegionID         │
   │           UserID          │
   │        CounterClass       │
   │             OS            │
   │         UserAgent         │
   │            URL            │
   │            ...            │
   │      ParamCurrencyID      │
   │    OpenstatServiceName    │
   │     OpenstatCampaignID    │
   │        OpenstatAdID       │
   │      OpenstatSourceID     │
   │         UTMSource         │
   │         UTMMedium         │
   │        UTMCampaign        │
   │         UTMContent        │
   │          UTMTerm          │
   │          FromTag          │
   │          HasGCLID         │
   │        RefererHash        │
   │          URLHash          │
   │            CLID           │
   │                           │
   │          Filters:         │
   │ optional: Dynamic Filter  │
   │        (EventTime)        │
   │                           │
   │      ~100000000 Rows      │
   └───────────────────────────┘
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to