gabotechs commented on PR #20901: URL: https://github.com/apache/datafusion/pull/20901#issuecomment-4129375485
I think a lot of the limitations and the weirdness that this PR, https://github.com/apache/datafusion/pull/20331 and https://github.com/apache/datafusion/pull/20416 show steam from the fact that we need "force" dynamic filters to behave as normal `dyn PhysicalExpr`. At the moment of introducing dynamic filters in this project, implementing `DynamicFilterPhysicalExpr` as a non-special case of `PhysicalExpr` was a good decision, but I wonder if a good evolution would be to treat dynamic filters as "first class citizens", and promote them to `DynamicFilter`, with no chains to `PhysicalExpr` (besides the fact that they **produce** PhysicalExprs instead of **being** PhysicalExprs). One thing I noticed is that, by modeling dynamic filters as yet another case of `PhysicalExpr`, several needs unrelated to other physical expressions appear, which results in several new methods in the `PhsysicalExpr` trait that are only useful for the dynamic filters case, and are left unimplemented in every other case: - [PhysicalExpr::snapshot](https://github.com/apache/datafusion/blob/9b726bcf288e7779b855ead5b869b0a69aba5c92/datafusion/physical-expr-common/src/physical_expr.rs#L394-L394) - [PhysicalExpr::snapshot_generation](https://github.com/apache/datafusion/blob/9b726bcf288e7779b855ead5b869b0a69aba5c92/datafusion/physical-expr-common/src/physical_expr.rs#L410-L410) Sometimes, they directly cannot be treated as normal `PhysicalExpr`, needing to hold references to the specific `DynamicFilterPhysicalExpr` implementation, rather than a generic `PhysicalExpr`: - [AggrDynFilter.filter](https://github.com/apache/datafusion/blob/9b726bcf288e7779b855ead5b869b0a69aba5c92/datafusion/physical-plan/src/aggregates/mod.rs#L554-L554) - [HashJoinExecDynamicFilter.filter](https://github.com/apache/datafusion/blob/9b726bcf288e7779b855ead5b869b0a69aba5c92/datafusion/physical-plan/src/joins/hash_join/exec.rs#L766-L766) - [TopKDynamicFilters.filter](https://github.com/apache/datafusion/blob/9b726bcf288e7779b855ead5b869b0a69aba5c92/datafusion/physical-plan/src/topk/mod.rs#L145-L145) A consequence of this, is that further contributions to extend dynamic filters in functionality need to pass through adding new methods to `PhysicalExpr` that are only implemented for `DynamicFilterPhysicalExpr`, and ignored in every other case: - [PhysicalExpr::as_dedupable](https://github.com/jayshrivastava/datafusion/blob/js/dedupe-dynamic-filter-inner-state/datafusion/physical-expr-common/src/physical_expr.rs#L457-L457) from https://github.com/apache/datafusion/pull/20416 - [PhysicalExpr::bind_runtime](https://github.com/gene-bordegaray/datafusion/blob/10a36f01c16ac9a4a19f116ee9885b3893c2d88a/datafusion/physical-expr-common/src/physical_expr.rs#L433) Or add contributions in other parts of the project that claim to be "general purpose", but in practice they are only useful for dynamic filters: - [PhysicalExprNode.internal_id](https://github.com/jayshrivastava/datafusion/blob/js/dedupe-dynamic-filter-inner-state/datafusion/proto/proto/datafusion.proto#L891-L891) - [DeduplicatingSerializer](https://github.com/apache/datafusion/blob/9b726bcf288e7779b855ead5b869b0a69aba5c92/datafusion/proto/src/physical_plan/mod.rs#L3842-L3842) (not sure about this one, I get the impression that only dynamic filters are actually deduped, as no other expression is referenced more than once in a plan, but maybe I'm wrong) Doing some research, Trino represents dynamic filters as first class citizens, with a dedicated interface: - [DynamicFilter](https://github.com/trinodb/trino/blob/442557460b901f3b0cc37312c1d5a485b5c38bae/core/trino-spi/src/main/java/io/trino/spi/connector/DynamicFilter.java#L21) And Trino connectors are expected to interact with dynamic filters by directly passing them as separate arguments - [ConnectorSplitManager#getSplits()](https://github.com/trinodb/trino/blob/442557460b901f3b0cc37312c1d5a485b5c38bae/core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorSplitManager.java#L24) A lot of the reservations I have with the way this PR and https://github.com/apache/datafusion/pull/20416 are proposing to extend dynamic filters in functionality arise from the fact that the blast radius of these contributions is pretty big, as they need to rely on extending very central pillars of DataFusion like `PhysicalExpr` for getting the job done, when the actual functionality is scoped to just dynamic filters. IMHO, If we had a way of contributing functionality to dynamic filters without necessarily touching the very core of DataFusion, the flow of these contributions would be much smoother, and I could imagine how representing dynamic filters, not as `PhysicalExpr` implementations, but as first class `DynamicFilter` structs, has good chances of delivering such vision. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
