[ 
https://issues.apache.org/jira/browse/SPARK-19086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802885#comment-15802885
 ] 

Herman van Hovell commented on SPARK-19086:
-------------------------------------------

We alternate column resolution between the inner and the outer plan during 
subquery resolution. In this case we cannot resolve the query using the inner 
columns (because of the aggregate), but we can resolve it using the outer 
column.

So I am not entirely sure this is a problem, since the inner aggregate does not 
produce a column named `t2c` and the outer plan does.

I look forward to be convinced otherwise :)

> Improper scoping of name resolution of columns in HAVING clause
> ---------------------------------------------------------------
>
>                 Key: SPARK-19086
>                 URL: https://issues.apache.org/jira/browse/SPARK-19086
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Nattavut Sutyanyong
>            Priority: Minor
>
> There seems to be a problem on the scoping of name resolution of columns in a 
> HAVING clause.
> Here is a scenario of the problem:
> {code}
> // A simplified version of TC 01.13 from PR-16337
> Seq((1,1,1)).toDF("t1a", "t1b", "t1c").createOrReplaceTempView("t1")
> Seq((1,1,1)).toDF("t2a", "t2b", "t2c").createOrReplaceTempView("t2")
> // This is okay. 
> // Error: t2c is unresolved
> sql("select t2a from t2 group by t2a having t2c = 8").show
> // This is okay as t2c is resolved to the t2 on the parent side
> // because t2 in the subquery does not output column t2c.
> sql("select * from t2 where t2a in (select t2a from (select t2a from t2) t2 
> group by t2a having t2c = 8)").explain(true)
> // This is the problem.
> sql("select * from t2 where t2a in (select t2a from t2 group by t2a having 
> t2c = 8)").explain(true)
> == Analyzed Logical Plan ==
> t2a: int, t2b: int, t2c: int
> Project [t2a#22, t2b#23, t2c#24]
> +- Filter predicate-subquery#38 [(t2a#22 = t2a#22#49) && (t2c#24 = 8)]
>    :  +- Project [t2a#22 AS t2a#22#49]
>    :     +- Aggregate [t2a#22], [t2a#22]
>    :        +- SubqueryAlias t2, `t2`
>    :           +- Project [_1#18 AS t2a#22, _2#19 AS t2b#23, _3#20 AS t2c#24]
>    :              +- LocalRelation [_1#18, _2#19, _3#20]
>    +- SubqueryAlias t2, `t2`
>       +- Project [_1#18 AS t2a#22, _2#19 AS t2b#23, _3#20 AS t2c#24]
>          +- LocalRelation [_1#18, _2#19, _3#20]
> {code}
> We should not resolve {{t2c}} in the subquery to the outer {{t2}} on the 
> parent side. It should try to resolve {{t2c}} to the {{t2}} in the subquery 
> from its current scope and raise an exception because it is invalid to pull 
> up the column {{t2c}} from the {{Aggregate}} operator below.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to