[jira] [Commented] (SPARK-17348) Incorrect results from subquery transformation

Herman van Hovell (JIRA) Wed, 31 Aug 2016 15:08:08 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453516#comment-15453516
 ]


Herman van Hovell commented on SPARK-17348:
-------------------------------------------

This is an interesting one. TBH I have never seen such a query being used in 
practice. Could you tell me what the analyzed plan should look like, because 
the only solution to me would be to implicitly join {{t1}} to {{t2}} in the 
subquery. 

For 2.0.1 we should fail analysis in this case. We have an analyzer rule in 
place to make sure no one uses aggregates in combination with correlated scalar 
subqueries. We could extend that or move that into analysis. It would be nice 
to have a fix for 2.1.

> Incorrect results from subquery transformation
> ----------------------------------------------
>
>                 Key: SPARK-17348
>                 URL: https://issues.apache.org/jira/browse/SPARK-17348
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Nattavut Sutyanyong
>              Labels: correctness
>
> {noformat}
> Seq((1,1)).toDF("c1","c2").createOrReplaceTempView("t1")
> Seq((1,1),(2,0)).toDF("c1","c2").createOrReplaceTempView("t2")
> sql("select c1 from t1 where c1 in (select max(t2.c1) from t2 where t1.c2 >= 
> t2.c2)").show
> +---+
> | c1|
> +---+
> |  1|
> +---+
> {noformat}
> The correct result of the above query should be an empty set. Here is an 
> explanation:
> Both rows from T2 satisfies the correlated predicate T1.C2 >= T2.C2 when 
> T1.C1 = 1 so both rows needs to be processed in the same group of the 
> aggregation process in the subquery. The result of the aggregation yields 
> MAX(T2.C1) as 2. Therefore, the result of the evaluation of the predicate 
> T1.C1 (which is 1) IN MAX(T2.C1) (which is 2) should be an empty set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-17348) Incorrect results from subquery transformation

Reply via email to