jayshrivastava commented on code in PR #20416:
URL: https://github.com/apache/datafusion/pull/20416#discussion_r2829875181
##########
datafusion/physical-expr/src/expressions/dynamic_filters.rs:
##########
@@ -327,6 +467,14 @@ impl DynamicFilterPhysicalExpr {
Arc::strong_count(self) > 1 || Arc::strong_count(&self.inner) > 1
}
+ /// Returns a unique identifier for the inner shared state.
+ ///
+ /// Useful for checking if two [`Arc<PhysicalExpr>`] with the same
+ /// underlying [`DynamicFilterPhysicalExpr`] are the same.
+ pub fn inner_id(&self) -> u64 {
+ Arc::as_ptr(&self.inner) as *const () as u64
+ }
Review Comment:
It sounds like all of these cases are valid? Especially since datafusion
let's people implement their own data sources.
1. Hash join create a dynamic filter, then the data source calls
`Arc::clone(&filter)`. The outer arc has a count of 2, inner has a count of 1.
2. Hash join create a dynamic filter, then the data source calls
`reassign_expr_columns -> new_with_children` only. I'm pretty sure in this
case, there are two outer Arcs, each with a count of 1. And they point to the
same inner Arc, which has a count of 2.
3. A combination of 1, then 2, and so on which causes `weak=2, strong=2` and
whatever other combinations.
I think this is the correct implementation of `is_used` (and is what is used
in my branch). In all cases, at least one should be incremented.
```
Arc::strong_count(self) > 1 || Arc::strong_count(&self.inner) > 1
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]