zanmato1984 commented on issue #44513:
URL: https://github.com/apache/arrow/issues/44513#issuecomment-2449292594

   Hi @kolfild26 , thanks for reporting this.
   
   There are lots of solved issues from v13 to v18 that may cause silent wrong 
answer or segfault in hash join, and possibly more unrevealed ones as well. So 
it is not too surprising that different versions behave differently.
   
   Could you please provide us the complete schemas and the estimated sizes of 
both tables? And better yet, could you give a more-or-less working limit of 
your case? These are essential informations to investigate this issue.
   
   Also, there might be a workaround that worth a try, change 
`t_18m.join(t_487m, keys=[''col1, 'col2', 'col3'],  join_type="left outer")` to 
`t_487m.join(t_18m, keys=[''col1, 'col2', 'col3'],  join_type="right outer")`. 
(I assume `t_18m` is much smaller than `t_487m` and this will make our hash 
join to use the small table to build the hash table.)
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to