[ https://issues.apache.org/jira/browse/SPARK-46981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837039#comment-17837039 ]
Jarred Li commented on SPARK-46981: ----------------------------------- I used default driver memory setting(1GB), OOM was thrown out. It should be catalyst issue. > Driver OOM happens in query planning phase with empty tables > ------------------------------------------------------------ > > Key: SPARK-46981 > URL: https://issues.apache.org/jira/browse/SPARK-46981 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.5.0 > Environment: * OSS Spark 3.5.0 > * Amazon EMR Spark 3.3.0 (EMR release label 6.9.0) > * AWS Glue Spark 3.3.0 (Glue version 4.0) > Reporter: Noritaka Sekiyama > Priority: Major > Attachments: create_sanitized_tables.py, > test_and_twodays_simplified.sql > > > We have observed that Driver OOM happens in query planning phase with empty > tables when we ran specific patterns of queries. > h2. Issue details > If we run the query with where condition {{{}pt>='20231004' and > pt<='20231004', then the query fails in planning phase due to Driver OOM, > more specifically, "java.lang.OutOfMemoryError: GC overhead limit > exceeded"{}}}. > If we change the where condition from {{pt>='20231004' and pt<='20231004'}} > to {{{}pt='20231004' or pt='20231005'{}}}, the SQL can run without any error. > > This issue happened even with empty table, and it happened before actual data > load. This seems like an issue in catalyst side. > h2. Reproduction step > Attaching script and query to reproduce the issue. > * create_sanitized_tables.py: Script to create table definitions > ** No need to place any data files as this happens with empty location. > * test_and_twodays_simplified.sql: Query to reproduce the issue > Here's the typical stacktrace: > ~at scala.collection.immutable.Vector.iterator(Vector.scala:100)~ > ~at scala.collection.immutable.Vector.iterator(Vector.scala:69)~ > ~at scala.collection.IterableLike.foreach(IterableLike.scala:74)~ > ~at scala.collection.IterableLike.foreach$(IterableLike.scala:73)~ > ~at scala.collection.AbstractIterable.foreach(Iterable.scala:56)~ > ~at > scala.collection.generic.GenericTraversableTemplate.transpose(GenericTraversableTemplate.scala:219)~ > ~at > scala.collection.generic.GenericTraversableTemplate.transpose$(GenericTraversableTemplate.scala:211)~ > ~at scala.collection.AbstractTraversable.transpose(Traversable.scala:108)~ > ~at > org.apache.spark.sql.catalyst.plans.logical.Union.output(basicLogicalOperators.scala:461)~ > ~at > org.apache.spark.sql.catalyst.plans.logical.Window.output(basicLogicalOperators.scala:1205)~ > ~at > org.apache.spark.sql.catalyst.planning.PhysicalOperation$.$anonfun$unapply$2(patterns.scala:119)~ > ~at > org.apache.spark.sql.catalyst.planning.PhysicalOperation$$$Lambda$1874/539825188.apply(Unknown > Source)~ > ~at scala.Option.getOrElse(Option.scala:189)~ > ~at > org.apache.spark.sql.catalyst.planning.PhysicalOperation$.unapply(patterns.scala:119)~ > ~at > org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:307)~ > ~at > org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)~ > ~at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$Lambda$2114/1104718965.apply(Unknown > Source)~ > ~at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)~ > ~at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)~ > ~at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)~ > ~at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)~ > ~at > org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:70)~ > ~at > org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78)~ > ~at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$Lambda$2117/2079515765.apply(Unknown > Source)~ > ~at > scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196)~ > ~at > scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194)~ > ~at scala.collection.Iterator.foreach(Iterator.scala:943)~ > ~at scala.collection.Iterator.foreach$(Iterator.scala:943)~ > ~at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)~ > ~at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199)~ > ~at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192)~ > ~at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431)~ > ~GC overhead limit exceeded~ > ~java.lang.OutOfMemoryError: GC overhead limit exceeded~ > ~at scala.collection.immutable.Vector.iterator(Vector.scala:100)~ > ~at scala.collection.immutable.Vector.iterator(Vector.scala:69)~ > ~at scala.collection.IterableLike.foreach(IterableLike.scala:74)~ > ~at scala.collection.IterableLike.foreach$(IterableLike.scala:73)~ > ~at scala.collection.AbstractIterable.foreach(Iterable.scala:56)~ > ~at > scala.collection.generic.GenericTraversableTemplate.transpose(GenericTraversableTemplate.scala:219)~ > ~at > scala.collection.generic.GenericTraversableTemplate.transpose$(GenericTraversableTemplate.scala:211)~ > ~at scala.collection.AbstractTraversable.transpose(Traversable.scala:108)~ > ~at > org.apache.spark.sql.catalyst.plans.logical.Union.output(basicLogicalOperators.scala:461)~ > ~at > org.apache.spark.sql.catalyst.plans.logical.Window.output(basicLogicalOperators.scala:1205)~ > ~at > org.apache.spark.sql.catalyst.planning.PhysicalOperation$.$anonfun$unapply$2(patterns.scala:119)~ > ~at > org.apache.spark.sql.catalyst.planning.PhysicalOperation$$$Lambda$1874/539825188.apply(Unknown > Source)~ > ~at scala.Option.getOrElse(Option.scala:189)~ > ~at > org.apache.spark.sql.catalyst.planning.PhysicalOperation$.unapply(patterns.scala:119)~ > ~at > org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:307)~ > ~at > org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)~ > ~at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$Lambda$2114/1104718965.apply(Unknown > Source)~ > ~at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)~ > ~at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)~ > ~at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)~ > ~at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)~ > ~at > org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:70)~ > ~at > org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78)~ > ~at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$Lambda$2117/2079515765.apply(Unknown > Source)~ > ~at > scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196)~ > ~at > scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194)~ > ~at scala.collection.Iterator.foreach(Iterator.scala:943)~ > ~at scala.collection.Iterator.foreach$(Iterator.scala:943)~ > ~at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)~ > ~at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199)~ > ~at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192)~ > ~at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431)~ > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org