[ https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
angerszhu updated SPARK-48027: ------------------------------ Description: {code:java} with refund_info as ( select loan_id, 1 as refund_type from default.table_b where grass_date = '2024-04-25' ), next_month_time as ( select /*+ broadcast(b, c) */ loan_id ,1 as final_repayment_time FROM default.table_c where grass_date = '2024-04-25' ) select a.loan_id ,c.final_repayment_time ,b.refund_type from (select loan_id from default.table_a2 where grass_date = '2024-04-25' select loan_id from default.table_a1 where grass_date = '2024-04-24' ) a left join refund_info b on a.loan_id = b.loan_id left join next_month_time c on a.loan_id = c.loan_id ; {code} !image-2024-04-28-16-38-37-510.png|width=899,height=201! In this query, it inject table_b as table_c's runtime filter, but table_b join condition is LEFT OUTER, causing table_c missing data. Caused by InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since left plan is a UNION< result is NONE, then zip l/r keys to extract from right. Then cause this issue !image-2024-04-28-16-41-08-392.png|width=883,height=706! was: {code:java} with refund_info as ( select loan_id, 1 as refund_type from credit.table_b where grass_date = '2024-04-25' ), next_month_time as ( select /*+ broadcast(b, c) */ loan_id ,1 as final_repayment_time FROM credit.table_c where grass_date = '2024-04-25' ) select a.loan_id ,c.final_repayment_time ,b.refund_type from (select loan_id from credit_fund.table_a2 where grass_date = '2024-04-25' --当天新增卖出的loan union all select loan_id from credit_fund.table_a1 where grass_date = '2024-04-24' and loan_abs_status != 600 --历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600) ) a left join refund_info b on a.loan_id = b.loan_id left join next_month_time c on a.loan_id = c.loan_id ; {code} !image-2024-04-28-16-38-37-510.png|width=899,height=201! In this query, it inject table_b as table_c's runtime filter, but table_b join condition is LEFT OUTER, causing table_c missing data. Caused by InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since left plan is a UNION< result is NONE, then zip l/r keys to extract from right. Then cause this issue !image-2024-04-28-16-41-08-392.png|width=883,height=706! > InjectRuntimeFilter for multi-level join should check child join type > --------------------------------------------------------------------- > > Key: SPARK-48027 > URL: https://issues.apache.org/jira/browse/SPARK-48027 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 4.0.0, 3.5.1, 3.4.3 > Reporter: angerszhu > Priority: Major > Attachments: image-2024-04-28-16-38-37-510.png, > image-2024-04-28-16-41-08-392.png > > > {code:java} > with > refund_info as ( > select > loan_id, > 1 as refund_type > from > default.table_b > where grass_date = '2024-04-25' > > ), > next_month_time as ( > select /*+ broadcast(b, c) */ > loan_id > ,1 as final_repayment_time > FROM default.table_c > where grass_date = '2024-04-25' > ) > select > a.loan_id > ,c.final_repayment_time > ,b.refund_type from > (select > loan_id > from > default.table_a2 > where grass_date = '2024-04-25' > select > loan_id > from > default.table_a1 > where grass_date = '2024-04-24' > ) a > left join > refund_info b > on a.loan_id = b.loan_id > left join > next_month_time c > on a.loan_id = c.loan_id > ; > {code} > !image-2024-04-28-16-38-37-510.png|width=899,height=201! > > In this query, it inject table_b as table_c's runtime filter, but table_b > join condition is LEFT OUTER, causing table_c missing data. > Caused by > InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since > left plan is a UNION< result is NONE, then zip l/r keys to extract from > right. Then cause this issue > !image-2024-04-28-16-41-08-392.png|width=883,height=706! -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org