[
https://issues.apache.org/jira/browse/SPARK-36768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415630#comment-17415630
]
Willi Raschkowski commented on SPARK-36768:
-------------------------------------------
In the debugger I see [on this
line|https://github.com/apache/spark/blob/b665782f0d3729928be4ca897ec2eb990b714879/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala#L227]
{{collectMatches}} doesn't produce any matches because {{qualified3Part}} is
an empty map. And it seems to be an empty map because the {{"col"}} attribute
in this {{AttributeSeq}} has an empty qualifiers.
On the other hand, if you do
{code:sql}
SELECT t.col FROM parquet.testdata t
{code}
the {{"col"}} attribute in the {{AttributeSeq}} has a {{"t"}} as attribute. And
thus we get matches [on this
line|https://github.com/apache/spark/blob/b665782f0d3729928be4ca897ec2eb990b714879/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala#L253]
when filtering for the {{"t"}} qualifier.
Naively, that makes we wonder why in the {{"parquet.testdata.col"}} case
{{"parquet.testdata"}} is not part of the {{"col"}} attribute's qualifier, but
when we alias the table the alias is included as qualifier.
> Cannot resolve attribute with table reference
> ---------------------------------------------
>
> Key: SPARK-36768
> URL: https://issues.apache.org/jira/browse/SPARK-36768
> Project: Spark
> Issue Type: Task
> Components: SQL
> Affects Versions: 2.4.7, 3.0.3, 3.1.2
> Reporter: Willi Raschkowski
> Priority: Major
>
> Spark seems in some cases unable to resolve attributes that contain
> multi-part names where the first parts reference a table. Here's a repro:
> {code:python}
> >>> spark.range(3).toDF("col").write.parquet("testdata")
> # Single name part attribute is fine
> >>> spark.sql("SELECT col FROM parquet.testdata").show()
> +---+
> |col|
> +---+
> | 1|
> | 0|
> | 2|
> +---+
> # Name part with the table reference fails
> >>> spark.sql("SELECT parquet.testdata.col FROM parquet.testdata").show()
> AnalysisException: cannot resolve '`parquet.testdata.col`' given input
> columns: [col]; line 1 pos 7;
> 'Project ['parquet.testdata.col]
> +- Relation[col#50L] parquet
> {code}
> The expected behavior is that {{parquet.testdata.col}} is recognized as
> referring to attribute {{col}} in {{parquet.testdata}} (you'd expect
> {{AttributeSeq.resolve}} matches [this
> case|https://github.com/apache/spark/blob/b665782f0d3729928be4ca897ec2eb990b714879/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala#L214-L239]).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]