[jira] [Created] (SPARK-26103) OutOfMemory error with large query plans

Dave DeCaprio (JIRA) Sat, 17 Nov 2018 12:08:34 -0800

Dave DeCaprio created SPARK-26103:
-------------------------------------

             Summary: OutOfMemory error with large query plans
                 Key: SPARK-26103
                 URL: https://issues.apache.org/jira/browse/SPARK-26103
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.2, 2.3.1, 2.3.0
         Environment: Amazon EMR 5.19


1 c5.4xlarge master instance

1 c5.4xlarge core instance

2 c5.4xlarge task instances
            Reporter: Dave DeCaprio


Large query plans can cause OutOfMemory errors in the Spark driver.

We are creating data frames that are not extremely large but contain lots of 
nested joins.  These plans execute efficiently because of caching and 
partitioning, but the text version of the query plans generated can be hundreds 
of megabytes.  Running many of these in parallel causes our driver process to 
fail.

{{{{Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at 
java.util.Arrays.copyOfRange(Arrays.java:2694) at 
java.lang.String.<init>(String.java:203) at 
java.lang.StringBuilder.toString(StringBuilder.java:405) at 
scala.StringContext.standardInterpolator(StringContext.scala:125) at 
scala.StringContext.s(StringContext.scala:90) at 
org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:70) 
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:52)
 at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108)
 }}}}

 

A similar error is reported in 
[https://stackoverflow.com/questions/38307258/out-of-memory-error-when-writing-out-spark-dataframes-to-parquet-format]

 

Code exists to truncate the string if the number of output columns is larger 
than 25, but not if the rest of the query plan is huge.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26103) OutOfMemory error with large query plans

Reply via email to