Bruce Robbins created SPARK-43113: ------------------------------------- Summary: Codegen error when full outer join's bound condition has multiple references to the same stream-side column Key: SPARK-43113 URL: https://issues.apache.org/jira/browse/SPARK-43113 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.2, 3.4.0, 3.5.0 Reporter: Bruce Robbins
Example # 1 (sort merge join): {noformat} create or replace temp view v1 as select * from values (1, 1), (2, 2), (3, 1) as v1(key, value); create or replace temp view v2 as select * from values (1, 22, 22), (3, -1, -1), (7, null, null) as v2(a, b, c); select * from v1 full outer join v2 on key = a and value > b and value > c; {noformat} The join's generated code causes the following compilation error: {noformat} org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 277, Column 9: Redefinition of local variable "smj_isNull_7" {noformat} Example #2 (shuffle hash join): {noformat} select /*+ SHUFFLE_HASH(v2) */ * from v1 full outer join v2 on key = a and value > b and value > c; {noformat} The shuffle hash join's generated code causes the following compilation error: {noformat} org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 174, Column 5: Redefinition of local variable "shj_value_1" {noformat} With default configuration, both queries end up succeeding, since Spark falls back to running each query with whole-stage codegen disabled. The issue happens only when the join's bound condition refers to the same stream-side column more than once. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org