[ 
https://issues.apache.org/jira/browse/SPARK-21538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21538:
----------------------------
    Issue Type: Improvement  (was: Story)

> Attribute resolution inconsistency in Dataset API
> -------------------------------------------------
>
>                 Key: SPARK-21538
>                 URL: https://issues.apache.org/jira/browse/SPARK-21538
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Adrian Ionescu
>
> {code}
> spark.range(1).withColumnRenamed("id", "x").sort(col("id"))  // works
> spark.range(1).withColumnRenamed("id", "x").sort($"id")  // works
> spark.range(1).withColumnRenamed("id", "x").sort('id) // works
> spark.range(1).withColumnRenamed("id", "x").sort("id") // fails with:
> org.apache.spark.sql.AnalysisException: Cannot resolve column name "id" among 
> (x);
> ...
> {code}
> It looks like the Dataset API functions taking {{String}} use the basic 
> resolver that only look at the columns at that level, whereas all the other 
> means of expressing an attribute are lazily resolved during the analyzer.
> The reason why the first 3 calls work is explained in the docs for {{object 
> ResolveMissingReferences}}:
> {code}
>   /**
>    * In many dialects of SQL it is valid to sort by attributes that are not 
> present in the SELECT
>    * clause.  This rule detects such queries and adds the required attributes 
> to the original
>    * projection, so that they will be available during sorting. Another 
> projection is added to
>    * remove these attributes after sorting.
>    *
>    * The HAVING clause could also used a grouping columns that is not 
> presented in the SELECT.
>    */
> {code}
> For consistency, it would be good to use the same attribute resolution 
> mechanism everywhere.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to