alamb commented on issue #15037: URL: https://github.com/apache/datafusion/issues/15037#issuecomment-2734575745
> Does anyone have a handle on how we might implement this? I was thinking we’d need to add a method to exec operators called `apply_filter` but that basically sends down the additional filter and by default it gets forwarded to children until it hits an exec that knows what to do with it (eg DataSourceExec). But I’m not very clear beyond that. To begin with I would suggest: 1. Make a new PhysicalExpr named something like `TopKRuntimeFilter` 2. Add a physical optimizer pass that runs after all other passes (so the structure doesn't change) that finds `TopK` nodes and tries to find connected Scans the (start with some basic rules, don't try and go past joins, etc) 3. Add `TopKRuntimeFilter` to those scans Then the trick will be to figure out how to share the `TopKHeap` created in the TopK operator https://github.com/apache/datafusion/blob/8c8b2454cbd78204dc6426f9898b79c179486a86/datafusion/physical-plan/src/topk/mod.rs#L259 With the `TopKRuntimeFilter` And then orchestrate concurrent access to it somehow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org