[jira] [Updated] (SPARK-43541) Incorrect column resolution on FULL OUTER JOIN with USING

Max Gekk (Jira) Wed, 17 May 2023 12:50:04 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-43541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Max Gekk updated SPARK-43541:
-----------------------------
    Description: 
This was tested on Spark 3.3.2 and Spark 3.4.0.

{code}
Causes [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with 
name `aws_dbr_a`.`key` cannot be resolved. Did you mean one of the following? 
[`key`].; line 4, pos 7
{code}


FULL OUTER JOIN with USING and/or the WHERE seems relevant since I can get the 
query to work with any of these modifications. 


{code}
# -- FULL OUTER JOIN
                   WITH
                   aws_dbr_a AS (select key from values ('a') t(key)),
                   gcp_pro_b AS (select key from values ('a') t(key))
                   SELECT aws_dbr_a.key
                   FROM aws_dbr_a
                   FULL OUTER JOIN gcp_pro_b USING (key)
                   WHERE aws_dbr_a.key NOT LIKE 'spark.clusterUsageTags.%';
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name 
`aws_dbr_a`.`key` cannot be resolved. Did you mean one of the following? 
[`key`].; line 4 pos 7
# -- INNER JOIN
                   WITH
                   aws_dbr_a AS (select key from values ('a') t(key)),
                   gcp_pro_b AS (select key from values ('a') t(key))
                   SELECT aws_dbr_a.key
                   FROM aws_dbr_a
                   JOIN gcp_pro_b USING (key)
                   WHERE aws_dbr_a.key NOT LIKE 'spark.clusterUsageTags.%';
+-----+
| key |
|-----|
| a   |
+-----+
1 row in set
Time: 0.507s
# -- NO Filter
                   WITH
                   aws_dbr_a AS (select key from values ('a') t(key)),
                   gcp_pro_b AS (select key from values ('a') t(key))
                   SELECT aws_dbr_a.key
                   FROM aws_dbr_a
                   FULL OUTER JOIN gcp_pro_b USING (key);
+-----+
| key |
|-----|
| a   |
+-----+
1 row in set
Time: 1.021s
# -- ON instead of USING
                   WITH
                   aws_dbr_a AS (select key from values ('a') t(key)),
                   gcp_pro_b AS (select key from values ('a') t(key))
                   SELECT aws_dbr_a.key
                   FROM aws_dbr_a
                   FULL OUTER JOIN gcp_pro_b ON aws_dbr_a.key = gcp_pro_b.key
                   WHERE aws_dbr_a.key NOT LIKE 'spark.clusterUsageTags.%';
+-----+
| key |
|-----|
| a   |
+-----+
1 row in set
Time: 0.514s
{code}


  was:
This was tested on Spark 3.3.2 and Spark 3.4.0.

{code}
Causes [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with 
name `aws_dbr_a`.`key` cannot be resolved. Did you mean one of the following? 
[`key`].; line 4, pos 7
{code}


FULL OUTER JOIN with USING and/or the WHERE seems relevant since I can get the 
query to work with any of these modifications. 


{code}
# WITH
                   aws_dbr_a AS (select key from values ('a') t(key)),
                   gcp_pro_b AS (select key from values ('a') t(key))
                   SELECT aws_dbr_a.key
                   FROM aws_dbr_a
                   FULL OUTER JOIN gcp_pro_b USING (key)
                   WHERE aws_dbr_a.key NOT LIKE 'spark.clusterUsageTags.%';
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name 
`aws_dbr_a`.`key` cannot be resolved. Did you mean one of the following? 
[`key`].; line 4 pos 7
# -- INNER JOIN
                   WITH
                   aws_dbr_a AS (select key from values ('a') t(key)),
                   gcp_pro_b AS (select key from values ('a') t(key))
                   SELECT aws_dbr_a.key
                   FROM aws_dbr_a
                   JOIN gcp_pro_b USING (key)
                   WHERE aws_dbr_a.key NOT LIKE 'spark.clusterUsageTags.%';
+-----+
| key |
|-----|
| a   |
+-----+
1 row in set
Time: 0.507s
# -- NO Filter
                   WITH
                   aws_dbr_a AS (select key from values ('a') t(key)),
                   gcp_pro_b AS (select key from values ('a') t(key))
                   SELECT aws_dbr_a.key
                   FROM aws_dbr_a
                   FULL OUTER JOIN gcp_pro_b USING (key);
+-----+
| key |
|-----|
| a   |
+-----+
1 row in set
Time: 1.021s
# -- ON instead of USING
                   WITH
                   aws_dbr_a AS (select key from values ('a') t(key)),
                   gcp_pro_b AS (select key from values ('a') t(key))
                   SELECT aws_dbr_a.key
                   FROM aws_dbr_a
                   FULL OUTER JOIN gcp_pro_b ON aws_dbr_a.key = gcp_pro_b.key
                   WHERE aws_dbr_a.key NOT LIKE 'spark.clusterUsageTags.%';
+-----+
| key |
|-----|
| a   |
+-----+
1 row in set
Time: 0.514s
{code}



> Incorrect column resolution on FULL OUTER JOIN with USING
> ---------------------------------------------------------
>
>                 Key: SPARK-43541
>                 URL: https://issues.apache.org/jira/browse/SPARK-43541
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.2, 3.4.0
>            Reporter: Max Gekk
>            Assignee: Max Gekk
>            Priority: Major
>
> This was tested on Spark 3.3.2 and Spark 3.4.0.
> {code}
> Causes [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter 
> with name `aws_dbr_a`.`key` cannot be resolved. Did you mean one of the 
> following? [`key`].; line 4, pos 7
> {code}
> FULL OUTER JOIN with USING and/or the WHERE seems relevant since I can get 
> the query to work with any of these modifications. 
> {code}
> # -- FULL OUTER JOIN
>                    WITH
>                    aws_dbr_a AS (select key from values ('a') t(key)),
>                    gcp_pro_b AS (select key from values ('a') t(key))
>                    SELECT aws_dbr_a.key
>                    FROM aws_dbr_a
>                    FULL OUTER JOIN gcp_pro_b USING (key)
>                    WHERE aws_dbr_a.key NOT LIKE 'spark.clusterUsageTags.%';
> [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name 
> `aws_dbr_a`.`key` cannot be resolved. Did you mean one of the following? 
> [`key`].; line 4 pos 7
> # -- INNER JOIN
>                    WITH
>                    aws_dbr_a AS (select key from values ('a') t(key)),
>                    gcp_pro_b AS (select key from values ('a') t(key))
>                    SELECT aws_dbr_a.key
>                    FROM aws_dbr_a
>                    JOIN gcp_pro_b USING (key)
>                    WHERE aws_dbr_a.key NOT LIKE 'spark.clusterUsageTags.%';
> +-----+
> | key |
> |-----|
> | a   |
> +-----+
> 1 row in set
> Time: 0.507s
> # -- NO Filter
>                    WITH
>                    aws_dbr_a AS (select key from values ('a') t(key)),
>                    gcp_pro_b AS (select key from values ('a') t(key))
>                    SELECT aws_dbr_a.key
>                    FROM aws_dbr_a
>                    FULL OUTER JOIN gcp_pro_b USING (key);
> +-----+
> | key |
> |-----|
> | a   |
> +-----+
> 1 row in set
> Time: 1.021s
> # -- ON instead of USING
>                    WITH
>                    aws_dbr_a AS (select key from values ('a') t(key)),
>                    gcp_pro_b AS (select key from values ('a') t(key))
>                    SELECT aws_dbr_a.key
>                    FROM aws_dbr_a
>                    FULL OUTER JOIN gcp_pro_b ON aws_dbr_a.key = gcp_pro_b.key
>                    WHERE aws_dbr_a.key NOT LIKE 'spark.clusterUsageTags.%';
> +-----+
> | key |
> |-----|
> | a   |
> +-----+
> 1 row in set
> Time: 0.514s
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-43541) Incorrect column resolution on FULL OUTER JOIN with USING

Reply via email to