[ 
https://issues.apache.org/jira/browse/SPARK-36768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Willi Raschkowski updated SPARK-36768:
--------------------------------------
    Description: 
Spark seems in some cases unable to resolve attributes that contain multi-part 
names where the first parts reference a table. Here's a repro:
{code:python}
>>> spark.range(3).toDF("col").write.parquet("testdata")

# Single name part attribute is fine
>>> spark.sql("SELECT col FROM parquet.testdata").show()
+---+
|col|
+---+
|  1|
|  0|
|  2|
+---+

# Name part with the table reference fails
>>> spark.sql("SELECT parquet.testdata.col FROM parquet.testdata").show()

AnalysisException: cannot resolve '`parquet.testdata.col`' given input columns: 
[col]; line 1 pos 7;
'Project ['parquet.testdata.col]
+- Relation[col#50L] parquet
{code}

The expected behavior is that {{parquet.testdata.col}} is recognized as 
referring to attribute {{col}} in {{parquet.testdata}}.

This also reproduces on master at time of writing.

  was:
Spark seems in some cases unable to resolve attributes that contain multi-part 
names where the first parts reference a table. Here's a repro:
{code:python}
>>> spark.range(3).toDF("col").write.parquet("testdata")

# Single name part attribute is fine
>>> spark.sql("SELECT col FROM parquet.testdata").show()
+---+
|col|
+---+
|  1|
|  0|
|  2|
+---+

# Name part with the table reference fails
>>> spark.sql("SELECT parquet.testdata.col FROM parquet.testdata").show()

AnalysisException: cannot resolve '`parquet.testdata.col`' given input columns: 
[col]; line 1 pos 7;
'Project ['parquet.testdata.col]
+- Relation[col#50L] parquet
{code}

This also reproduces on master at time of writing.


> Cannot resolve attribute with table reference
> ---------------------------------------------
>
>                 Key: SPARK-36768
>                 URL: https://issues.apache.org/jira/browse/SPARK-36768
>             Project: Spark
>          Issue Type: Task
>          Components: SQL
>    Affects Versions: 2.4.7, 3.0.3, 3.1.2
>            Reporter: Willi Raschkowski
>            Priority: Major
>
> Spark seems in some cases unable to resolve attributes that contain 
> multi-part names where the first parts reference a table. Here's a repro:
> {code:python}
> >>> spark.range(3).toDF("col").write.parquet("testdata")
> # Single name part attribute is fine
> >>> spark.sql("SELECT col FROM parquet.testdata").show()
> +---+
> |col|
> +---+
> |  1|
> |  0|
> |  2|
> +---+
> # Name part with the table reference fails
> >>> spark.sql("SELECT parquet.testdata.col FROM parquet.testdata").show()
> AnalysisException: cannot resolve '`parquet.testdata.col`' given input 
> columns: [col]; line 1 pos 7;
> 'Project ['parquet.testdata.col]
> +- Relation[col#50L] parquet
> {code}
> The expected behavior is that {{parquet.testdata.col}} is recognized as 
> referring to attribute {{col}} in {{parquet.testdata}}.
> This also reproduces on master at time of writing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to