[ 
https://issues.apache.org/jira/browse/SPARK-19086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803381#comment-15803381
 ] 

Nattavut Sutyanyong commented on SPARK-19086:
---------------------------------------------

I wore a conservative lens when opening this JIRA. This is from the point of 
view that `t2c` is available on the current query block but due to the 
aggregate, it just cannot be referenced above the aggregate. I have checked on 
MySQL. It uses a progressive lens. So does Spark. When it cannot reference a 
column, it goes out to the next outer scope to find one.

The problem with this, especially when deep correlation is supported, is the 
outer scope where the column is resolved can be 10 levels up. (okay, I am 
exaggerated here. Who, or which program, on earth will create such SQL!) It 
would be very confusing for the reader. This can be easily solved by adding the 
correlation name to the column.

I will close the JIRA.



> Improper scoping of name resolution of columns in HAVING clause
> ---------------------------------------------------------------
>
>                 Key: SPARK-19086
>                 URL: https://issues.apache.org/jira/browse/SPARK-19086
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Nattavut Sutyanyong
>            Priority: Minor
>
> There seems to be a problem on the scoping of name resolution of columns in a 
> HAVING clause.
> Here is a scenario of the problem:
> {code}
> // A simplified version of TC 01.13 from PR-16337
> Seq((1,1,1)).toDF("t1a", "t1b", "t1c").createOrReplaceTempView("t1")
> Seq((1,1,1)).toDF("t2a", "t2b", "t2c").createOrReplaceTempView("t2")
> // This is okay. 
> // Error: t2c is unresolved
> sql("select t2a from t2 group by t2a having t2c = 8").show
> // This is okay as t2c is resolved to the t2 on the parent side
> // because t2 in the subquery does not output column t2c.
> sql("select * from t2 where t2a in (select t2a from (select t2a from t2) t2 
> group by t2a having t2c = 8)").explain(true)
> // This is the problem.
> sql("select * from t2 where t2a in (select t2a from t2 group by t2a having 
> t2c = 8)").explain(true)
> == Analyzed Logical Plan ==
> t2a: int, t2b: int, t2c: int
> Project [t2a#22, t2b#23, t2c#24]
> +- Filter predicate-subquery#38 [(t2a#22 = t2a#22#49) && (t2c#24 = 8)]
>    :  +- Project [t2a#22 AS t2a#22#49]
>    :     +- Aggregate [t2a#22], [t2a#22]
>    :        +- SubqueryAlias t2, `t2`
>    :           +- Project [_1#18 AS t2a#22, _2#19 AS t2b#23, _3#20 AS t2c#24]
>    :              +- LocalRelation [_1#18, _2#19, _3#20]
>    +- SubqueryAlias t2, `t2`
>       +- Project [_1#18 AS t2a#22, _2#19 AS t2b#23, _3#20 AS t2c#24]
>          +- LocalRelation [_1#18, _2#19, _3#20]
> {code}
> We should not resolve {{t2c}} in the subquery to the outer {{t2}} on the 
> parent side. It should try to resolve {{t2c}} to the {{t2}} in the subquery 
> from its current scope and raise an exception because it is invalid to pull 
> up the column {{t2c}} from the {{Aggregate}} operator below.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to