alamb commented on issue #15037:
URL: https://github.com/apache/datafusion/issues/15037#issuecomment-2734575745

   > Does anyone have a handle on how we might implement this? I was thinking 
we’d need to add a method to exec operators called `apply_filter` but that 
basically sends down the additional filter and by default it gets forwarded to 
children until it hits an exec that knows what to do with it (eg 
DataSourceExec). But I’m not very clear beyond that.
   
   To begin with I would suggest:
   1.  Make a new PhysicalExpr named something like `TopKRuntimeFilter`
   2. Add a physical optimizer pass that runs after all other passes  (so the 
structure doesn't change) that finds `TopK` nodes and tries to find connected 
Scans the  (start with some basic rules, don't try and go past joins, etc)
   3. Add `TopKRuntimeFilter` to those scans 
   
   Then the trick will be to figure out how to share the `TopKHeap` created in 
the TopK operator
   
    
https://github.com/apache/datafusion/blob/8c8b2454cbd78204dc6426f9898b79c179486a86/datafusion/physical-plan/src/topk/mod.rs#L259
   
   With the `TopKRuntimeFilter` 
   
   And then orchestrate concurrent access to it somehow 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to