lgbo-ustc opened a new issue, #6768: URL: https://github.com/apache/incubator-gluten/issues/6768
### Backend CH (ClickHouse) ### Bug description [Expected behavior] and [actual behavior]. We met a query in production environment which has a real bad performace on join. The query looks like follow ```sql select * from t1 left join t2 on t1.uid = t2.uid and (t1.id1 = t2.id1 or t1.id2 = t2.id2 or t1.id3 = t2.id3) ``` There are two main problems First, The right table is very large, over 5,000,000,000 rows. Using it to build the join hash table is very resource intensive Second, when only apply join condition `t1.uid = t2.uid`, it could bring a very large matching results, > 5,000,000,000 * 100. But after apply filter condition `(t1.id1 = t2.id1 or t1.id2 = t2.id2 or t1.id3 = t2.id3)` on this matching result, less then 100000 rows left. ### Spark version None ### Spark configurations _No response_ ### System information _No response_ ### Relevant logs _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
