kumarUjjawal commented on PR #19659: URL: https://github.com/apache/datafusion/pull/19659#issuecomment-3715110275
Thanks for the feedback, I did some analysis before implementation to best of my understanding; | Approach | Pros | Cons | |--------|------|------| | **Identity-based (chosen)** | Stable hash, no breaking change, correct semantics for filter identity | Different instances with the same expression aren’t equal | | **Panic in `Hash`** | Explicit, forces use of `snapshot_physical_expr()` | **Breaking change** — any `HashSet<Arc<dyn PhysicalExpr>>` containing dynamic filters would panic | | **Include generation in hash** | Content-aware | Still violates the contract — race between `hash()` and `eq()` | | **Document only** | No code change | Silent runtime bugs remain | Current Implementation: - `PhysicalExpr` requires `Hash` Generic code (e.g., equivalence classes) uses `HashMap<Arc<dyn PhysicalExpr>, _>`. Panicking would break this for any tree containing `DynamicFilterPhysicalExpr`. - Two dynamic filters with different inner `Arc`s are different filters (independent update lifecycles), even if they have the same expression at some moment. - `with_new_children()` does the right thing Filters created via tree transformation share the inner value and compare equal, which is exactly what filter pushdown expects. The panic approach is reasonable but would be a breaking change(I think). Happy to discuss further if you think the explicitness outweighs the compatibility benefit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
