zanmato1984 commented on issue #44513: URL: https://github.com/apache/arrow/issues/44513#issuecomment-2449292594
Hi @kolfild26 , thanks for reporting this. There are lots of solved issues from v13 to v18 that may cause silent wrong answer or segfault in hash join, and possibly more unrevealed ones as well. So it is not too surprising that different versions behave differently. Could you please provide us the complete schemas and the estimated sizes of both tables? And better yet, could you give a more-or-less working limit of your case? These are essential informations to investigate this issue. Also, there might be a workaround that worth a try, change `t_18m.join(t_487m, keys=[''col1, 'col2', 'col3'], join_type="left outer")` to `t_487m.join(t_18m, keys=[''col1, 'col2', 'col3'], join_type="right outer")`. (I assume `t_18m` is much smaller than `t_487m` and this will make our hash join to use the small table to build the hash table.) Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
