gabotechs commented on PR #20901:
URL: https://github.com/apache/datafusion/pull/20901#issuecomment-4129375485

   I think a lot of the limitations and the weirdness that this PR, 
https://github.com/apache/datafusion/pull/20331 and 
https://github.com/apache/datafusion/pull/20416 show steam from the fact that 
we need "force" dynamic filters to behave as normal `dyn PhysicalExpr`.
   
   At the moment of introducing dynamic filters in this project, implementing 
`DynamicFilterPhysicalExpr` as a non-special case of `PhysicalExpr` was a good 
decision, but I wonder if a good evolution would be to treat dynamic filters as 
"first class citizens", and promote them to `DynamicFilter`, with no chains to 
`PhysicalExpr` (besides the fact that they **produce** PhysicalExprs instead of 
**being** PhysicalExprs).
   
   One thing I noticed is that, by modeling dynamic filters as yet another case 
of `PhysicalExpr`, several needs unrelated to other physical expressions 
appear, which results in several new methods in the `PhsysicalExpr` trait that 
are only useful for the dynamic filters case, and are left unimplemented in 
every other case:
   - 
[PhysicalExpr::snapshot](https://github.com/apache/datafusion/blob/9b726bcf288e7779b855ead5b869b0a69aba5c92/datafusion/physical-expr-common/src/physical_expr.rs#L394-L394)
   - 
[PhysicalExpr::snapshot_generation](https://github.com/apache/datafusion/blob/9b726bcf288e7779b855ead5b869b0a69aba5c92/datafusion/physical-expr-common/src/physical_expr.rs#L410-L410)
   
   Sometimes, they directly cannot be treated as normal `PhysicalExpr`, needing 
to hold references to the specific `DynamicFilterPhysicalExpr` implementation, 
rather than a generic `PhysicalExpr`:
   - 
[AggrDynFilter.filter](https://github.com/apache/datafusion/blob/9b726bcf288e7779b855ead5b869b0a69aba5c92/datafusion/physical-plan/src/aggregates/mod.rs#L554-L554)
   - 
[HashJoinExecDynamicFilter.filter](https://github.com/apache/datafusion/blob/9b726bcf288e7779b855ead5b869b0a69aba5c92/datafusion/physical-plan/src/joins/hash_join/exec.rs#L766-L766)
   - 
[TopKDynamicFilters.filter](https://github.com/apache/datafusion/blob/9b726bcf288e7779b855ead5b869b0a69aba5c92/datafusion/physical-plan/src/topk/mod.rs#L145-L145)
   
   A consequence of this, is that further contributions to extend dynamic 
filters in functionality need to pass through adding new methods to 
`PhysicalExpr` that are only implemented for `DynamicFilterPhysicalExpr`, and 
ignored in every other case:
   - 
[PhysicalExpr::as_dedupable](https://github.com/jayshrivastava/datafusion/blob/js/dedupe-dynamic-filter-inner-state/datafusion/physical-expr-common/src/physical_expr.rs#L457-L457)
 from https://github.com/apache/datafusion/pull/20416
   - 
[PhysicalExpr::bind_runtime](https://github.com/gene-bordegaray/datafusion/blob/10a36f01c16ac9a4a19f116ee9885b3893c2d88a/datafusion/physical-expr-common/src/physical_expr.rs#L433)
   
   Or add contributions in other parts of the project that claim to be "general 
purpose", but in practice they are only useful for dynamic filters:
   - 
[PhysicalExprNode.internal_id](https://github.com/jayshrivastava/datafusion/blob/js/dedupe-dynamic-filter-inner-state/datafusion/proto/proto/datafusion.proto#L891-L891)
   - 
[DeduplicatingSerializer](https://github.com/apache/datafusion/blob/9b726bcf288e7779b855ead5b869b0a69aba5c92/datafusion/proto/src/physical_plan/mod.rs#L3842-L3842)
 (not sure about this one, I get the impression that only dynamic filters are 
actually deduped, as no other expression is referenced more than once in a 
plan, but maybe I'm wrong)
   
   Doing some research, Trino represents dynamic filters as first class 
citizens, with a dedicated interface:
   - 
[DynamicFilter](https://github.com/trinodb/trino/blob/442557460b901f3b0cc37312c1d5a485b5c38bae/core/trino-spi/src/main/java/io/trino/spi/connector/DynamicFilter.java#L21)
   
   And Trino connectors are expected to interact with dynamic filters by 
directly passing them as separate arguments
   - 
[ConnectorSplitManager#getSplits()](https://github.com/trinodb/trino/blob/442557460b901f3b0cc37312c1d5a485b5c38bae/core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorSplitManager.java#L24)
   
   A lot of the reservations I have with the way this PR and 
https://github.com/apache/datafusion/pull/20416 are proposing to extend dynamic 
filters in functionality arise from the fact that the blast radius of these 
contributions is pretty big, as they need to rely on extending very central 
pillars of DataFusion like `PhysicalExpr` for getting the job done, when the 
actual functionality is scoped to just dynamic filters.
   
   IMHO, If we had a way of contributing functionality to dynamic filters 
without necessarily touching the very core of DataFusion, the flow of these 
contributions would be much smoother, and I could imagine how representing 
dynamic filters, not as `PhysicalExpr` implementations, but as first class 
`DynamicFilter` structs, has good chances of delivering such vision.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to