[ https://issues.apache.org/jira/browse/SPARK-26103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-26103: ------------------------------------ Assignee: (was: Apache Spark) > OutOfMemory error with large query plans > ---------------------------------------- > > Key: SPARK-26103 > URL: https://issues.apache.org/jira/browse/SPARK-26103 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0, 2.3.1, 2.3.2 > Environment: Amazon EMR 5.19 > 1 c5.4xlarge master instance > 1 c5.4xlarge core instance > 2 c5.4xlarge task instances > Reporter: Dave DeCaprio > Priority: Major > > Large query plans can cause OutOfMemory errors in the Spark driver. > We are creating data frames that are not extremely large but contain lots of > nested joins. These plans execute efficiently because of caching and > partitioning, but the text version of the query plans generated can be > hundreds of megabytes. Running many of these in parallel causes our driver > process to fail. > {{{{Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at > java.util.Arrays.copyOfRange(Arrays.java:2694) at > java.lang.String.<init>(String.java:203) at > java.lang.StringBuilder.toString(StringBuilder.java:405) at > scala.StringContext.standardInterpolator(StringContext.scala:125) at > scala.StringContext.s(StringContext.scala:90) at > org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:70) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:52) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108) > }}}} > > A similar error is reported in > [https://stackoverflow.com/questions/38307258/out-of-memory-error-when-writing-out-spark-dataframes-to-parquet-format] > > Code exists to truncate the string if the number of output columns is larger > than 25, but not if the rest of the query plan is huge. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org