Dandandan commented on issue #235: URL: https://github.com/apache/arrow-datafusion/issues/235#issuecomment-830783405
I reproduced the bug by explicitly setting the concurrency for those tests to `24`. So here is my hypothesis into what happens: * The left join has a wrong implementation in that it will produce rows when they are missing in the right batch, instead of in the entire partition. * The referenced commit has some changes to (re)hashing of single columns, which means that columns could end up without any right-side rows. * We also use the same hashing code in hash-repartition which means that the `33` row could end up in its "own" partition. In that case, no right batch is being processed, so no row is being generated for `33`. I have a feeling that to fix this in the general case it would be best to "just" fix the left join implementation. Another option would be maybe to cherry-pick this change which would fix just this test from PR #55: https://github.com/apache/arrow-datafusion/pull/55/files#diff-44d49c7778aa0c300afacdd7d89b0729ffaedd932d1ac34f3ef8db6b6cdfd73aR904 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
