[ 
https://issues.apache.org/jira/browse/SPARK-52498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-52498:
-----------------------------------
    Labels: SQL pull-request-available  (was: SQL)

> The self joins  behaviour is broken and inconsistent  in general and 
> different between single pass resolver and regular resolver
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-52498
>                 URL: https://issues.apache.org/jira/browse/SPARK-52498
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Asif
>            Priority: Major
>              Labels: SQL, pull-request-available
>
> As described in previous bug 
> [SPARK-47320|https://issues.apache.org/jira/projects/SPARK/issues/SPARK-47320],
>  the problems with the regular analyzer while dealing with self joins,
> this bug highlights the inconsistent behaviour between regular analyzer and 
> single pass analyzer.
> There is an existing test in ExpressionIdAssignerSuite
> {{test("DataFrame Join, same table, several layers") {
>     withTable("t1") {
>       spark.sql("CREATE TABLE t1 (col1 INT, col2 INT, col3 INT)")
>       val result = withSQLConf(
>         SQLConf.ANALYZER_SINGLE_PASS_RESOLVER_ENABLED.key -> "true",
>         SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "false"
>       ) {
>         val df1 = spark.sql("SELECT col1, 1 AS a, col2, 2 AS b, col3, 3 AS c 
> FROM t1")
>         val df2 = df1
>           .join(df1, df1("col1") === 0)
>           .select(df1("col1"), df1("a"), df1("col2"), df1("b"), df1("col3"), 
> df1("c"))
>         val df3 = df2
>           .join(df2, df2("col1") === 0)
>           .select(df2("col1"), df2("a"), df2("col2"), df2("b"), df2("col3"), 
> df2("c"))
>         df3
>           .join(df3, df3("col1") === 0)
>           .select(df3("col1"), df3("a"), df3("col2"), df3("b"), df3("col3"), 
> df3("c"))
>       }
>       checkExpressionIdAssignment(result.queryExecution.analyzed)
>     }
>   }}}
> The above test also passes when SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key 
> -> "true"
> But the tests fail in both the combinations:
> Combination1
> SQLConf.ANALYZER_SINGLE_PASS_RESOLVER_ENABLED.key -> "false"
> SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "true"
> Combination2
> SQLConf.ANALYZER_SINGLE_PASS_RESOLVER_ENABLED.key -> "false"
> SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "false"
> Ideally the test should fail if SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key 
> -> "true", in both resolvers ( single pass and original)
> and should pass if SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "false" , 
> in case of both the resolvers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to