adriangb commented on code in PR #19937:
URL: https://github.com/apache/datafusion/pull/19937#discussion_r2716947317


##########
datafusion/physical-expr/src/expressions/dynamic_filters.rs:
##########
@@ -287,17 +293,18 @@ impl DynamicFilterPhysicalExpr {
 
     /// Wait asynchronously until this dynamic filter is marked as complete.
     ///
-    /// This method returns immediately if the filter is already complete or 
if the filter
-    /// is not being used by any consumers.
+    /// This method returns immediately if the filter is already complete.
     /// Otherwise, it waits until [`Self::mark_complete`] is called.
     ///
     /// Unlike [`Self::wait_update`], this method guarantees that when it 
returns,
     /// the filter is fully complete with no more updates expected.
-    pub async fn wait_complete(self: &Arc<Self>) {
-        if !self.is_used() {
-            return;
-        }
-
+    ///
+    /// # Note
+    ///
+    /// This method should only be called on filters that have consumers. If 
you don't
+    /// know whether the filter is being used, call [`Self::is_used`] first to 
avoid
+    /// waiting indefinitely.

Review Comment:
   I think there's a bit of nuance here. This is *not* because of the 
`DynamicFilterPhysicalExpr` API itself, it's only because of how `HashJoinExec` 
is implemented.
   
   Under normal operation it would be a consumer calling `wait_complete` and 
hence it *knows* that a consumer exists because it *is* a consumer. In other 
words, under normal operation `wait_completed` is only called by consumers and 
thus `is_used` would always be true.
   
   Or put another way, the only way this could go wrong is e.g. in a test or if 
`HashJoinExec` itself called `wait_complete`. By definition if more than 1 
thing has a reference to the dynamic filter, there is a consumer. If there is 
only 1 reference, it must be the one `HashJoinExec` has (outside of tests). So 
it would have to be `HashJoinExec` that is calling `wait_complete()` right?
   
   So the scenario described here seems more like a programming error or misuse 
of the APIs, not something that could happen under normal operation of a bug 
free usage of these APIs, right? In other words: if I was implementing this I 
could probably put the `is_used()` check behind `#[cfg(debug_assertions)]` or 
something to catch a programming error on my end, but it wouldn't really make 
sense to have that check at runtime in production, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to