korowa commented on PR #9830:
URL: 
https://github.com/apache/arrow-datafusion/pull/9830#issuecomment-2028430188

   And one more thing to consider (and this is the second concern) -- if 
LeftData will be relatively small (it's natural behaviour, due to the fact that 
CrossJoin supports input reordering basing on statistics), switching from "left 
row + right batch" to "right row + left batch" might potentially produce a 
massive amount of small batches -- to avoid it we might need to either wrap 
CrossJoin into CoalesceBatches (via this 
[rule](https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/physical_optimizer/coalesce_batches.rs)),
 or to add some built-in buffering.
   
   In current implementation small left side seems to be +- safe as we assume 
that right side is either large enough to provide full-size batches, and it's 
OK to produce output batch per left-side row, or, if right-side is small, due 
to reordering, left-side should be even smaller, so it won't make much harm.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to