[ 
https://issues.apache.org/jira/browse/SPARK-36768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415630#comment-17415630
 ] 

Willi Raschkowski commented on SPARK-36768:
-------------------------------------------

In the debugger I see [on this 
line|https://github.com/apache/spark/blob/b665782f0d3729928be4ca897ec2eb990b714879/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala#L227]
 {{collectMatches}} doesn't produce any matches because {{qualified3Part}} is 
an empty map. And it seems to be an empty map because the {{"col"}} attribute 
in this {{AttributeSeq}} has an empty qualifiers.

On the other hand, if you do
{code:sql}
SELECT t.col FROM parquet.testdata t
{code}
the {{"col"}} attribute in the {{AttributeSeq}} has a {{"t"}} as attribute. And 
thus we get matches [on this 
line|https://github.com/apache/spark/blob/b665782f0d3729928be4ca897ec2eb990b714879/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala#L253]
 when filtering for the {{"t"}} qualifier.

Naively, that makes we wonder why in the {{"parquet.testdata.col"}} case 
{{"parquet.testdata"}} is not part of the {{"col"}} attribute's qualifier, but 
when we alias the table the alias is included as qualifier.

> Cannot resolve attribute with table reference
> ---------------------------------------------
>
>                 Key: SPARK-36768
>                 URL: https://issues.apache.org/jira/browse/SPARK-36768
>             Project: Spark
>          Issue Type: Task
>          Components: SQL
>    Affects Versions: 2.4.7, 3.0.3, 3.1.2
>            Reporter: Willi Raschkowski
>            Priority: Major
>
> Spark seems in some cases unable to resolve attributes that contain 
> multi-part names where the first parts reference a table. Here's a repro:
> {code:python}
> >>> spark.range(3).toDF("col").write.parquet("testdata")
> # Single name part attribute is fine
> >>> spark.sql("SELECT col FROM parquet.testdata").show()
> +---+
> |col|
> +---+
> |  1|
> |  0|
> |  2|
> +---+
> # Name part with the table reference fails
> >>> spark.sql("SELECT parquet.testdata.col FROM parquet.testdata").show()
> AnalysisException: cannot resolve '`parquet.testdata.col`' given input 
> columns: [col]; line 1 pos 7;
> 'Project ['parquet.testdata.col]
> +- Relation[col#50L] parquet
> {code}
> The expected behavior is that {{parquet.testdata.col}} is recognized as 
> referring to attribute {{col}} in {{parquet.testdata}} (you'd expect 
> {{AttributeSeq.resolve}} matches [this 
> case|https://github.com/apache/spark/blob/b665782f0d3729928be4ca897ec2eb990b714879/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala#L214-L239]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to