jayzhan211 commented on PR #15281:
URL: https://github.com/apache/datafusion/pull/15281#issuecomment-2764483579
```
statement count 0
create table t(a int, b int) as values (11, 2), (3, 0);
statement count 0
create table t2(a int, b int) as values (11, 3), (13, 1);
query IT
select e.b ,(select case when max(e2.a) > 10 then 'a' else 'b' end from t2
e2 where e2.b = e.b+1 ) from t e;
----
0 a
2 a
query TT
explain select e.b ,(select case when max(e2.a) > 10 then 'a' else 'b' end
from t2 e2 where e2.b = e.b+1 ) from t e;
----
logical_plan
01)Projection: e.b, CASE WHEN __scalar_sq_1.__always_true IS NULL THEN
Utf8("b") ELSE __scalar_sq_1.CASE WHEN max(e2.a) > Int64(10) THEN Utf8("a")
ELSE Utf8("b") END END AS CASE WHEN max(e2.a) > Int64(10) THEN Utf8("a") ELSE
Utf8("b") END
02)--Left Join: CAST(e.b AS Int64) + Int64(1) = CAST(__scalar_sq_1.b AS
Int64)
03)----SubqueryAlias: e
04)------TableScan: t projection=[b]
05)----SubqueryAlias: __scalar_sq_1
06)------Projection: CASE WHEN max(e2.a) > Int32(10) THEN Utf8("a") ELSE
Utf8("b") END AS CASE WHEN max(e2.a) > Int64(10) THEN Utf8("a") ELSE Utf8("b")
END, e2.b, Boolean(true) AS __always_true
07)--------Aggregate: groupBy=[[e2.b]], aggr=[[max(e2.a)]]
08)----------SubqueryAlias: e2
09)------------TableScan: t2 projection=[a, b]
physical_plan
01)ProjectionExec: expr=[b@0 as b, CASE WHEN __always_true@2 IS NULL THEN b
ELSE CASE WHEN max(e2.a) > Int64(10) THEN Utf8("a") ELSE Utf8("b") END@1 END as
CASE WHEN max(e2.a) > Int64(10) THEN Utf8("a") ELSE Utf8("b") END]
02)--CoalesceBatchesExec: target_batch_size=8192
03)----HashJoinExec: mode=Partitioned, join_type=Left, on=[(e.b +
Int64(1)@1, CAST(__scalar_sq_1.b AS Int64)@3)], projection=[b@0, CASE WHEN
max(e2.a) > Int64(10) THEN Utf8("a") ELSE Utf8("b") END@2, __always_true@4]
04)------CoalesceBatchesExec: target_batch_size=8192
05)--------RepartitionExec: partitioning=Hash([e.b + Int64(1)@1], 4),
input_partitions=1
06)----------ProjectionExec: expr=[b@0 as b, CAST(b@0 AS Int64) + 1 as e.b +
Int64(1)]
07)------------DataSourceExec: partitions=1, partition_sizes=[1]
08)------CoalesceBatchesExec: target_batch_size=8192
09)--------RepartitionExec: partitioning=Hash([CAST(__scalar_sq_1.b AS
Int64)@3], 4), input_partitions=4
10)----------ProjectionExec: expr=[CASE WHEN max(e2.a)@1 > 10 THEN a ELSE b
END as CASE WHEN max(e2.a) > Int64(10) THEN Utf8("a") ELSE Utf8("b") END, b@0
as b, true as __always_true, CAST(b@0 AS Int64) as CAST(__scalar_sq_1.b AS
Int64)]
11)------------AggregateExec: mode=FinalPartitioned, gby=[b@0 as b],
aggr=[max(e2.a)]
12)--------------CoalesceBatchesExec: target_batch_size=8192
13)----------------RepartitionExec: partitioning=Hash([b@0], 4),
input_partitions=4
14)------------------RepartitionExec: partitioning=RoundRobinBatch(4),
input_partitions=1
15)--------------------AggregateExec: mode=Partial, gby=[b@1 as b],
aggr=[max(e2.a)]
16)----------------------DataSourceExec: partitions=1, partition_sizes=[1]
```
I guess we incorrectly consider them as "unmatched columns", even if the
filter matched, we still generate `CASE WHEN __scalar_sq_1.__always_true IS
NULL` condition. Ideally, we should avoid this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]