gabotechs commented on code in PR #21641:
URL: https://github.com/apache/datafusion/pull/21641#discussion_r3088377599
##########
datafusion/physical-plan/src/joins/hash_join/stream.rs:
##########
@@ -216,9 +214,6 @@ pub(super) struct HashJoinStream {
right_side_ordered: bool,
/// Shared build accumulator for coordinating dynamic filter updates
(collects hash maps and/or bounds, optional)
build_accumulator: Option<Arc<SharedBuildAccumulator>>,
- /// Optional future to signal when build information has been reported by
all partitions
- /// and the dynamic filter has been updated
- build_waiter: Option<OnceFut<()>>,
Review Comment:
One pattern that I've seen in both Trino and DataFusion data source
implementations is to accept the dynamic filter push down, and during
execution, wait a grace period in the consumer side of the dynamic filter
before starting to pull data from the data source.
This means that the one responsible for deferring further execution is not
the one that produces it, but whoever is willing to consume it.
I think we have the right APIs for this to
([DynamicFilterPhysicalExpr::wait_update](https://github.com/apache/datafusion/blob/main/datafusion/physical-expr/src/expressions/dynamic_filters.rs#L283-L283)
and
[DynamicFilterPhysicalExpr::wait_complete](https://github.com/apache/datafusion/blob/main/datafusion/physical-expr/src/expressions/dynamic_filters.rs#L303)),
although I could imagine how depending on the source of the dynamic filter (if
it comes from a TopK or from a Join), the decisions of whether it's worth
waiting can be different.
I see that this PR does close the door to having something like that, so I
think it should be good.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]