[jira] [Updated] (SPARK-45722) False positive when cheking for ambigious columns

Alexey Dmitriev (Jira) Mon, 30 Oct 2023 04:02:48 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-45722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexey Dmitriev updated SPARK-45722:
------------------------------------
    Description: 
I have following code, which I expect to work
{code:java}
from pyspark.sql import SparkSession

import pyspark.sql.functions as F session = 
SparkSession.Builder().getOrCreate() A = session.createDataFrame([(1,)], ['a'])
B = session.createDataFrame([(1,)], ['b'])
A.join(B).select(B.b) # works fine
C = A.join(A.join(B), on=F.lit(False), how='leftanti') # C has the same columns 
as A (columns, not only names)
C.join(B).select(B.b) #doesn't work, says B.b is ambigious,
{code}
{code:java}
Exception below:{code}
{code:java}
AnalysisException: Column b#11L are ambiguous. It's probably because you joined 
several Datasets together, and some of these Datasets are the same. This column 
points to one of the Datasets but Spark is unable to figure out which one. 
Please alias the Datasets with different names via `Dataset.as` before joining 
them, and specify the column using qualified name, e.g. 
`df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set 
spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.{code}

  was:
I have following code, which I expect to work
{code:java}
from pyspark.sql import SparkSession

import pyspark.sql.functions as F session = 
SparkSession.Builder().getOrCreate() A = session.createDataFrame([(1,)], ['a'])
B = session.createDataFrame([(1,)], ['b'])
A.join(B).select(B.b) # works fine
C = A.join(A.join(B), on=F.lit(False), how='leftanti') # C has the same columns 
as A (columns, not only names)
C.join(B).select(B.b) #doesn't work, says B.b is ambigious,
{code}
{code:java}
AnalysisException: Column b#11L are ambiguous. It's probably because you joined 
several Datasets together, and some of these Datasets are the same. This column 
points to one of the Datasets but Spark is unable to figure out which one. 
Please alias the Datasets with different names via `Dataset.as` before joining 
them, and specify the column using qualified name, e.g. 
`df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set 
spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.{code}


> False positive when cheking for ambigious columns 
> --------------------------------------------------
>
>                 Key: SPARK-45722
>                 URL: https://issues.apache.org/jira/browse/SPARK-45722
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.4.0
>         Environment: py3.11 
> pyspark 3.4.0
>            Reporter: Alexey Dmitriev
>            Priority: Major
>
> I have following code, which I expect to work
> {code:java}
> from pyspark.sql import SparkSession
> import pyspark.sql.functions as F session = 
> SparkSession.Builder().getOrCreate() A = session.createDataFrame([(1,)], 
> ['a'])
> B = session.createDataFrame([(1,)], ['b'])
> A.join(B).select(B.b) # works fine
> C = A.join(A.join(B), on=F.lit(False), how='leftanti') # C has the same 
> columns as A (columns, not only names)
> C.join(B).select(B.b) #doesn't work, says B.b is ambigious,
> {code}
> {code:java}
> Exception below:{code}
> {code:java}
> AnalysisException: Column b#11L are ambiguous. It's probably because you 
> joined several Datasets together, and some of these Datasets are the same. 
> This column points to one of the Datasets but Spark is unable to figure out 
> which one. Please alias the Datasets with different names via `Dataset.as` 
> before joining them, and specify the column using qualified name, e.g. 
> `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set 
> spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45722) False positive when cheking for ambigious columns

Reply via email to