Noritaka Sekiyama created SPARK-46981: -----------------------------------------
Summary: Driver OOM happens in query planning phase with empty tables Key: SPARK-46981 URL: https://issues.apache.org/jira/browse/SPARK-46981 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Environment: * OSS Spark 3.5.0 * Amazon EMR Spark 3.3.0 (EMR release label 6.9.0) * AWS Glue Spark 3.3.0 (Glue version 4.0) Reporter: Noritaka Sekiyama Attachments: create_sanitized_tables.py We have observed that Driver OOM happens in query planning phase with empty tables when we ran specific patterns of queries. h2. Issue details If we run the query with where condition {{pt>='20231004' and pt<='20231004', then the query fails in planning phase due to Driver OOM, more specifically, }}{{{}{}}}{{{}java.lang.OutOfMemoryError: GC overhead limit exceeded{}}}{{{}{}}}. If we change the where condition from {{pt>='20231004' and pt<='20231004'}} to {{{}pt='20231004' or pt='20231005'{}}}, the SQL can run without any error. This issue happened even with empty table, and it happened before actual data load. This seems like an issue in catalyst side. h2. Reproduction step Attaching script and query to reproduce the issue. * create_sanitized_tables.py: Script to create table definitions * test_and_twodays_simplified.sql: Query to reproduce the issue Here's the typical stacktrace: {{ at scala.collection.immutable.Vector.iterator(Vector.scala:100) at scala.collection.immutable.Vector.iterator(Vector.scala:69) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at scala.collection.generic.GenericTraversableTemplate.transpose(GenericTraversableTemplate.scala:219) at scala.collection.generic.GenericTraversableTemplate.transpose$(GenericTraversableTemplate.scala:211) at scala.collection.AbstractTraversable.transpose(Traversable.scala:108) at org.apache.spark.sql.catalyst.plans.logical.Union.output(basicLogicalOperators.scala:461) at org.apache.spark.sql.catalyst.plans.logical.Window.output(basicLogicalOperators.scala:1205) at org.apache.spark.sql.catalyst.planning.PhysicalOperation$.$anonfun$unapply$2(patterns.scala:119) at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$$Lambda$1874/539825188.apply(Unknown Source) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.catalyst.planning.PhysicalOperation$.unapply(patterns.scala:119) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:307) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$Lambda$2114/1104718965.apply(Unknown Source) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:70) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$Lambda$2117/2079515765.apply(Unknown Source) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199) at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431) GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded at scala.collection.immutable.Vector.iterator(Vector.scala:100) at scala.collection.immutable.Vector.iterator(Vector.scala:69) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at scala.collection.generic.GenericTraversableTemplate.transpose(GenericTraversableTemplate.scala:219) at scala.collection.generic.GenericTraversableTemplate.transpose$(GenericTraversableTemplate.scala:211) at scala.collection.AbstractTraversable.transpose(Traversable.scala:108) at org.apache.spark.sql.catalyst.plans.logical.Union.output(basicLogicalOperators.scala:461) at org.apache.spark.sql.catalyst.plans.logical.Window.output(basicLogicalOperators.scala:1205) at org.apache.spark.sql.catalyst.planning.PhysicalOperation$.$anonfun$unapply$2(patterns.scala:119) at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$$Lambda$1874/539825188.apply(Unknown Source) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.catalyst.planning.PhysicalOperation$.unapply(patterns.scala:119) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:307) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$Lambda$2114/1104718965.apply(Unknown Source) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:70) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$Lambda$2117/2079515765.apply(Unknown Source) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199) at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431)}} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org