Avery Qi created SPARK-49646:
--------------------------------

             Summary: fix subquery decorrelation for union / set operations 
when parentOuterReferences has references not covered in 
collectedChildOuterReferences
                 Key: SPARK-49646
                 URL: https://issues.apache.org/jira/browse/SPARK-49646
             Project: Spark
          Issue Type: Bug
          Components: Optimizer
    Affects Versions: 4.0.0
            Reporter: Avery Qi


spark currently cannot handle queries like:
```

create table IF NOT EXISTS t(t1 INT,t2 int) using json;

CREATE TABLE IF NOT EXISTS a (a1 INT) using json;

select 1

from t as t_outer

left join

   lateral(

       select b1,b2

       from

       (

           select

               a.a1 as b1,

               1 as b2

           from a

           union

           select t_outer.t1 as b1,

                  null as b2

       ) as t_inner

       where (t_inner.b1 < t_outer.t2  or t_inner.b1 is null) and  t_inner.b1 = 
t_outer.t1

       order by t_inner.b1,t_inner.b2 desc limit 1

   ) as lateral_table

```

And the stack error trace is:

org.apache.spark.SparkException: <Redacted Exception Message>  at 
org.apache.spark.SparkException$.internalError(SparkException.scala:97)  at 
org.apache.spark.SparkException$.internalError(SparkException.scala:101)  at 
org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:447)
  at 
org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463)
  at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308)  
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307) 
 at 
org.apache.spark.sql.catalyst.plans.logical.Project.mapChildren(basicLogicalOperators.scala:87)
  at 
org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463)
  at 
org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$5(DecorrelateInnerQuery.scala:453)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) 
 at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)  
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)  at 
scala.collection.TraversableLike.map(TraversableLike.scala:286)  at 
scala.collection.TraversableLike.map$(TraversableLike.scala:279)  at 
scala.collection.AbstractTraversable.map(Traversable.scala:108)  at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:744)  
at 
org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:451)
  at 
org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463)
  at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308)  
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307) 
 at 
org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:1470)
  at 
org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463)
  at 
org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463)
  at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308)  
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307) 
 at 
org.apache.spark.sql.catalyst.plans.logical.Filter.mapChildren(basicLogicalOperators.scala:344)
  at 
org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463)
  at 
org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463)

...

 

{color:#172b4d}See this investigation doc for more context: {color}

[https://docs.google.com/document/d/1HtBDPKVD6pgGntTXdPVX27xH7PdcKTYNyQJLnwr7T-U/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to