Avery Qi created SPARK-49646: -------------------------------- Summary: fix subquery decorrelation for union / set operations when parentOuterReferences has references not covered in collectedChildOuterReferences Key: SPARK-49646 URL: https://issues.apache.org/jira/browse/SPARK-49646 Project: Spark Issue Type: Bug Components: Optimizer Affects Versions: 4.0.0 Reporter: Avery Qi
spark currently cannot handle queries like: ``` create table IF NOT EXISTS t(t1 INT,t2 int) using json; CREATE TABLE IF NOT EXISTS a (a1 INT) using json; select 1 from t as t_outer left join lateral( select b1,b2 from ( select a.a1 as b1, 1 as b2 from a union select t_outer.t1 as b1, null as b2 ) as t_inner where (t_inner.b1 < t_outer.t2 or t_inner.b1 is null) and t_inner.b1 = t_outer.t1 order by t_inner.b1,t_inner.b2 desc limit 1 ) as lateral_table ``` And the stack error trace is: org.apache.spark.SparkException: <Redacted Exception Message> at org.apache.spark.SparkException$.internalError(SparkException.scala:97) at org.apache.spark.SparkException$.internalError(SparkException.scala:101) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:447) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307) at org.apache.spark.sql.catalyst.plans.logical.Project.mapChildren(basicLogicalOperators.scala:87) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$5(DecorrelateInnerQuery.scala:453) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:744) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:451) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307) at org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:1470) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307) at org.apache.spark.sql.catalyst.plans.logical.Filter.mapChildren(basicLogicalOperators.scala:344) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463) ... {color:#172b4d}See this investigation doc for more context: {color} [https://docs.google.com/document/d/1HtBDPKVD6pgGntTXdPVX27xH7PdcKTYNyQJLnwr7T-U/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org