[
https://issues.apache.org/jira/browse/SPARK-49646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenchen Fan resolved SPARK-49646.
---------------------------------
Fix Version/s: 4.0.0
Resolution: Fixed
Issue resolved by pull request 48109
[https://github.com/apache/spark/pull/48109]
> fix subquery decorrelation for union / set operations when
> parentOuterReferences has references not covered in
> collectedChildOuterReferences
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-49646
> URL: https://issues.apache.org/jira/browse/SPARK-49646
> Project: Spark
> Issue Type: Bug
> Components: Optimizer
> Affects Versions: 4.0.0
> Reporter: Avery Qi
> Assignee: Avery Qi
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
>
> spark currently cannot handle queries like:
> ```
> create table IF NOT EXISTS t(t1 INT,t2 int) using json;
> CREATE TABLE IF NOT EXISTS a (a1 INT) using json;
> select 1
> from t as t_outer
> left join
> lateral(
> select b1,b2
> from
> (
> select
> a.a1 as b1,
> 1 as b2
> from a
> union
> select t_outer.t1 as b1,
> null as b2
> ) as t_inner
> where (t_inner.b1 < t_outer.t2 or t_inner.b1 is null) and t_inner.b1
> = t_outer.t1
> order by t_inner.b1,t_inner.b2 desc limit 1
> ) as lateral_table
> ```
> And the stack error trace is:
> org.apache.spark.SparkException: <Redacted Exception Message> at
> org.apache.spark.SparkException$.internalError(SparkException.scala:97) at
> org.apache.spark.SparkException$.internalError(SparkException.scala:101) at
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:447)
> at
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463)
> at
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308)
> at
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307)
> at
> org.apache.spark.sql.catalyst.plans.logical.Project.mapChildren(basicLogicalOperators.scala:87)
> at
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463)
> at
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$5(DecorrelateInnerQuery.scala:453)
> at
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at
> scala.collection.TraversableLike.map(TraversableLike.scala:286) at
> scala.collection.TraversableLike.map$(TraversableLike.scala:279) at
> scala.collection.AbstractTraversable.map(Traversable.scala:108) at
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:744)
> at
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:451)
> at
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463)
> at
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308)
> at
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307)
> at
> org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:1470)
> at
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463)
> at
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463)
> at
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308)
> at
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307)
> at
> org.apache.spark.sql.catalyst.plans.logical.Filter.mapChildren(basicLogicalOperators.scala:344)
> at
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463)
> at
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463)
> ...
>
> {color:#172b4d}See this investigation doc for more context: {color}
> [https://docs.google.com/document/d/1HtBDPKVD6pgGntTXdPVX27xH7PdcKTYNyQJLnwr7T-U/edit?usp=sharing]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]