berkaysynnada commented on PR #9830: URL: https://github.com/apache/arrow-datafusion/pull/9830#issuecomment-2032103953
> > I can add a `CoalesceBatchesExec` into the left child, but it requires more analysis such that what is the batch size of left child, is it already a `CoalesceBatchesExec`, if it is so how would they be merged etc. What I observe is that the rule adds `CoalesceBatchesExec` above the plans which reduces/filters the number of rows. CrossJoin does not do such a thing. I think all streams are written assuming they receive the correct number of batch size. > > Yes, I was pointing out that output of CrossJoin might require to be coalesced (even if both inputs are fine in terms of batch-sizes). Here is an example: for query > Thank you for the detailed review and benchmark results. Yes, you are right. In those cases (where left batch sizes are less than target batch size) this strategy shows a drastic regression. I tried to concat all builded batch results until target batch size is reached, but it still shows a bad performance (approximately x20 slower). I think I should revert the changes and just remove the lock and ScalarValue conversions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
