wecharyu commented on PR #12264: URL: https://github.com/apache/gluten/pull/12264#issuecomment-4702328991
> Gluten's implementation is the same as vanilla Spark's. Just out of curiosity, can Spark pass this failed test? Vanilla Spark does share the same hash relation, but it does not drop duplicates, so the two types of joins will not generate data issue. But Gluten native build hash table will drop duplicates for the `LEFT SEMI JOIN`, and the hash table is reused by the `INNER JOIN`, which cause the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
