jayshrivastava commented on code in PR #20416:
URL: https://github.com/apache/datafusion/pull/20416#discussion_r2956637740
##########
datafusion/physical-expr/src/expressions/dynamic_filters.rs:
##########
@@ -327,6 +468,14 @@ impl DynamicFilterPhysicalExpr {
Arc::strong_count(self) > 1 || Arc::strong_count(&self.inner) > 1
}
+ /// Returns a unique identifier for the inner shared state.
+ ///
+ /// Useful for checking if two [`Arc<PhysicalExpr>`] with the same
+ /// underlying [`DynamicFilterPhysicalExpr`] are the same.
+ pub fn inner_id(&self) -> u64 {
+ Arc::as_ptr(&self.inner) as *const () as u64
Review Comment:
👋🏽 @adriangb @gabotechs @LiaCastaneda this PR is ready for another review.
I've attempted to create a better abstraction. I've put the details in the
PR desc but will include them here too. Let me know what you think!
> This PR aims to fix those problems by making a few changes
> 1. `PhysicalExpr::expr_id` is added. This is basically the arc pointer
which we use today. Any expression can be deduped trivially using this ID
> 2. For dynamic filter expressions, we now have a new trait
`DedupablePhysicalExpr`.
> - Its purpose is to allow `PhysicalExpr` to internally define what
`dedupe` means.
> - It adds a new concept, an `internal_expr_id` which is different than
the one provided by `PhysicalExpr::expr_id`. It indicates that the expression
cannot be deduped trivially
> - There's a `dedupe(self, other)` method which allows the implementor to
define how to dedupe two expressions
>
> Then we update the `DedupingSerializer` and `DedupingDeserializer` to
dedupe on both levels, using the `PhysicalExpr::expr_id` and using the
`internal_expr_id`.
>
> One suspicious thing is that, since dynamic filters change with time (ie.
have internal mutability unlike other `PhysicalExpr`), I thought it would be
good to snapshot the generation as well. For this reason,
`DedupablePhysicalExpr` has a concept of a `DedupeSnapshot` which provides the
`internal_expr_id`. Both must be captured **atomically** to serialize the
dynamic filter expr.
>
> This PR does not
> - add a test for the `HashJoinExec` and `DataSourceExec` filter pushdown
case, but this is relevant follow up work.
> - PhysicalExtensionCodec methods should take a converter as arguments.
Distributed datafusion needs to use converters, specifically the deduping one
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]