Bruce Robbins created SPARK-43113:
-------------------------------------

             Summary: Codegen error when full outer join's bound condition has 
multiple references to the same stream-side column
                 Key: SPARK-43113
                 URL: https://issues.apache.org/jira/browse/SPARK-43113
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.3.2, 3.4.0, 3.5.0
            Reporter: Bruce Robbins


Example # 1 (sort merge join):
{noformat}
create or replace temp view v1 as
select * from values
(1, 1),
(2, 2),
(3, 1)
as v1(key, value);

create or replace temp view v2 as
select * from values
(1, 22, 22),
(3, -1, -1),
(7, null, null)
as v2(a, b, c);

select *
from v1
full outer join v2
on key = a
and value > b
and value > c;
{noformat}
The join's generated code causes the following compilation error:
{noformat}
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
277, Column 9: Redefinition of local variable "smj_isNull_7"
{noformat}
Example #2 (shuffle hash join):
{noformat}
select /*+ SHUFFLE_HASH(v2) */ *
from v1
full outer join v2
on key = a
and value > b
and value > c;
{noformat}
The shuffle hash join's generated code causes the following compilation error:
{noformat}
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
174, Column 5: Redefinition of local variable "shj_value_1" 
{noformat}
With default configuration, both queries end up succeeding, since Spark falls 
back to running each query with whole-stage codegen disabled.

The issue happens only when the join's bound condition refers to the same 
stream-side column more than once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to