martin-g commented on code in PR #18858:
URL: https://github.com/apache/datafusion/pull/18858#discussion_r2549938092
##########
datafusion/physical-plan/src/joins/hash_join/stream.rs:
##########
@@ -351,7 +357,7 @@ impl HashJoinStream {
) -> Self {
Self {
partition,
- schema,
+ schema: Arc::clone(&schema),
Review Comment:
Do you need to clone here ? Usually this happens at the call site
##########
datafusion/physical-plan/src/joins/hash_join/stream.rs:
##########
@@ -650,6 +674,8 @@ impl HashJoinStream {
)?
};
+ self.batch_coalescer.push_batch(result)?;
Review Comment:
Maybe add a comment that ignoring `PushBatchStatus::LimitReached` result is
intentional. Same at line 762 below.
##########
datafusion/physical-plan/src/joins/hash_join/stream.rs:
##########
Review Comment:
Doesn't this also need to be improved ?
Currently it returns the result without pushing it to the coalescer.
I think it should call `self.batch_coalescer.push_batch(result)?;` and then
check for a completed batch:
```
if let Some(batch) = self.batch_coalescer.next_completed_batch() {
return Ok(StatefulStreamResult::Ready(Some(batch)));
}
return Ok(StatefulStreamResult::Continue);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]