[
https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
angerszhu updated SPARK-48027:
------------------------------
Description:
{code:java}
with
refund_info as (
select
loan_id,
1 as refund_type
from
credit.table_b
where grass_date = '2024-04-25'
),
next_month_time as (
select /*+ broadcast(b, c) */
loan_id
,1 as final_repayment_time
FROM credit.table_c
where grass_date = '2024-04-25'
)
select
a.loan_id
,c.final_repayment_time
,b.refund_type from
(select
loan_id
from
credit_fund.table_a2
where grass_date = '2024-04-25' --当天新增卖出的loan union all
select
loan_id
from
credit_fund.table_a1
where grass_date = '2024-04-24' and loan_abs_status != 600
--历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600)
) a
left join
refund_info b
on a.loan_id = b.loan_id
left join
next_month_time c
on a.loan_id = c.loan_id
;
{code}
!image-2024-04-28-16-38-37-510.png|width=899,height=201!
In this query, it inject table_b as table_c's runtime filter, but table_b join
condition is LEFT OUTER, causing table_c missing data.
Caused by
InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since
left plan is a UNION< result is NONE, then zip l/r keys to extract from right.
Then cause this issue
!image-2024-04-28-16-41-08-392.png|width=883,height=706!
was:
{code:java}
with
refund_info as (
select
loan_id,
1 as refund_type
from
credit.table_b
where grass_date = '2024-04-25'
),
next_month_time as (
select /*+ broadcast(b, c) */
loan_id
,1 as final_repayment_time
FROM credit.table_c
where grass_date = '2024-04-25'
)
select
a.loan_id
,c.final_repayment_time
,b.refund_type from
(select
loan_id
from
credit_fund.table_a2
where grass_date = '2024-04-25' --当天新增卖出的loan union all
select
loan_id
from
credit_fund.table_a1
where grass_date = '2024-04-24' and loan_abs_status != 600
--历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600)
) a
left join
refund_info b
on a.loan_id = b.loan_id
left join
next_month_time c
on a.loan_id = c.loan_id
;
{code}
!image-2024-04-28-16-38-37-510.png|width=899,height=201!
> InjectRuntimeFilter for multi-level join should check child join type
> ---------------------------------------------------------------------
>
> Key: SPARK-48027
> URL: https://issues.apache.org/jira/browse/SPARK-48027
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.0.0, 3.5.1, 3.4.3
> Reporter: angerszhu
> Priority: Major
> Attachments: image-2024-04-28-16-38-37-510.png,
> image-2024-04-28-16-41-08-392.png
>
>
> {code:java}
> with
> refund_info as (
> select
> loan_id,
> 1 as refund_type
> from
> credit.table_b
> where grass_date = '2024-04-25'
>
> ),
> next_month_time as (
> select /*+ broadcast(b, c) */
> loan_id
> ,1 as final_repayment_time
> FROM credit.table_c
> where grass_date = '2024-04-25'
> )
> select
> a.loan_id
> ,c.final_repayment_time
> ,b.refund_type from
> (select
> loan_id
> from
> credit_fund.table_a2
> where grass_date = '2024-04-25' --当天新增卖出的loan union all
> select
> loan_id
> from
> credit_fund.table_a1
> where grass_date = '2024-04-24' and loan_abs_status != 600
> --历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600)
> ) a
> left join
> refund_info b
> on a.loan_id = b.loan_id
> left join
> next_month_time c
> on a.loan_id = c.loan_id
> ;
> {code}
> !image-2024-04-28-16-38-37-510.png|width=899,height=201!
>
> In this query, it inject table_b as table_c's runtime filter, but table_b
> join condition is LEFT OUTER, causing table_c missing data.
> Caused by
> InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since
> left plan is a UNION< result is NONE, then zip l/r keys to extract from
> right. Then cause this issue
> !image-2024-04-28-16-41-08-392.png|width=883,height=706!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]