[
https://issues.apache.org/jira/browse/SPARK-33502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236434#comment-17236434
]
Arwin S Tio edited comment on SPARK-33502 at 11/10/21, 7:22 PM:
----------------------------------------------------------------
Note, running my program with "-Xss3072k" fixed it. Giving Spark a bigger stack
lets you hold more columns in memory.
was (Author: cozos):
Note, running my program with "-Xss3072k" fixed it
> Large number of SELECT columns causes StackOverflowError
> --------------------------------------------------------
>
> Key: SPARK-33502
> URL: https://issues.apache.org/jira/browse/SPARK-33502
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.7
> Reporter: Arwin S Tio
> Priority: Minor
>
> On Spark 2.4.7 Standalone Mode on my laptop (Macbook Pro 2015), I ran the
> following:
> {code:java}
> public class TestSparkStackOverflow {
> public static void main(String [] args) {
> SparkSession spark = SparkSession
> .builder()
> .config("spark.master", "local[8]")
> .appName(TestSparkStackOverflow.class.getSimpleName())
> .getOrCreate();
> StructType inputSchema = new StructType();
> inputSchema = inputSchema.add("foo", DataTypes.StringType);
>
> Dataset<Row> inputDf = spark.createDataFrame(
> Arrays.asList(
> RowFactory.create("1"),
> RowFactory.create("2"),
> RowFactory.create("3")
> ),
> inputSchema
> );
>
> List<Column> lotsOfColumns = new ArrayList<>();
> for (int i = 0; i < 3000; i++) {
> lotsOfColumns.add(lit("").as("field" + i).cast(DataTypes.StringType));
> }
> lotsOfColumns.add(new Column("foo"));
> inputDf
>
> .select(JavaConverters.collectionAsScalaIterableConverter(lotsOfColumns).asScala().toSeq())
> .write()
> .format("csv")
> .mode(SaveMode.Append)
> .save("file:///tmp/testoutput");
> }
> }
> {code}
>
> And I get a StackOverflowError:
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: Job
> aborted.Exception in thread "main" org.apache.spark.SparkException: Job
> aborted. at
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
> at
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
> at
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
> at
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
> at
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
> at
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
> at
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
> at
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
> at
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
> at
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
> at
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
> at
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696) at
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291) at
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at
> udp.task.TestSparkStackOverflow.main(TestSparkStackOverflow.java:52)Caused
> by: java.lang.StackOverflowError at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1522)
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> ... redacted {code}
>
> The StackOverflowError goes away at around 500 columns.
>
> When running it through the debugger, I found that it happens when trying to
> serialize the
> "[executeTask|https://github.com/apache/spark/blob/14211a19f53bd0f413396582c8970e3e0a74281d/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L169-L177[]]"
> func in
> "[ClosureCleaner|https://github.com/apache/spark/blob/14211a19f53bd0f413396582c8970e3e0a74281d/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala#L405-L407]"
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]