[
https://issues.apache.org/jira/browse/SPARK-26996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778246#comment-16778246
]
Dongjoon Hyun commented on SPARK-26996:
---------------------------------------
Thank you for reporting, [~ilya745]. I also confirmed that the given example
work at Spark 2.3.3 and fails Spark 2.4.0.
> Scalar Subquery not handled properly in Spark 2.4
> --------------------------------------------------
>
> Key: SPARK-26996
> URL: https://issues.apache.org/jira/browse/SPARK-26996
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.4.0
> Reporter: Ilya Peysakhov
> Priority: Critical
>
> Spark 2.4 reports an error when querying a dataframe that has only 1 row and
> 1 column (scalar subquery).
>
> Reproducer is below. No other data is needed to reproduce the error.
> This will write a table of dates and strings, write another "fact" table of
> ints and dates, then read both tables as views and filter the "fact" based on
> the max(date) from the first table. This is done within spark-shell in spark
> 2.4 vanilla (also reproduced in AWS EMR 5.20.0)
> -------------------------
> spark.sql("select '2018-01-01' as latest_date, 'source1' as source UNION ALL
> select '2018-01-02', 'source2' UNION ALL select '2018-01-03' , 'source3'
> UNION ALL select '2018-01-04' ,'source4'
> ").write.mode("overwrite").save("/latest_dates")
> val mydatetable = spark.read.load("/latest_dates")
> mydatetable.createOrReplaceTempView("latest_dates")
> spark.sql("select 50 as mysum, '2018-01-01' as date UNION ALL select 100,
> '2018-01-02' UNION ALL select 300, '2018-01-03' UNION ALL select 3444,
> '2018-01-01' UNION ALL select 600, '2018-08-30'
> ").write.mode("overwrite").partitionBy("date").save("/mypartitioneddata")
> val source1 = spark.read.load("/mypartitioneddata")
> source1.createOrReplaceTempView("source1")
> spark.sql("select max(date), 'source1' as category from source1 where date >=
> (select latest_date from latest_dates where source='source1') ").show
> ----------------------------
>
> Error summary
> —
> java.lang.UnsupportedOperationException: Cannot evaluate expression:
> scalar-subquery#35 []
> at
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:258)
> at
> org.apache.spark.sql.catalyst.expressions.ScalarSubquery.eval(subquery.scala:246)
> -------
> This reproducer works in previous versions (2.3.2, 2.3.1, etc).
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]