[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

ASF GitHub Bot (JIRA) Tue, 11 Dec 2018 04:15:49 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16716986#comment-16716986
 ]


ASF GitHub Bot commented on SPARK-26224:
----------------------------------------

AmplabJenkins commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446181893
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99969/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-26224
>                 URL: https://issues.apache.org/jira/browse/SPARK-26224
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>         Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>            Reporter: Dorjee Tsering
>            Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

Reply via email to