[ https://issues.apache.org/jira/browse/SPARK-28742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Tsukanov updated SPARK-28742: ---------------------------------- Description: The following code {code:java} val rdd = sparkContext.makeRDD(Seq(Row("1"))) val schema = StructType(Seq( StructField("c1", StringType) )) val df = sparkSession.createDataFrame(rdd, schema) val column = when(col("c1").isin("1"), "1").otherwise(col("c1")) (1 to 9).foldLeft(df) { case (acc, _) => val res = acc.withColumn("c1", column) res.take(1) res } {code} falls with {code:java} java.lang.StackOverflowError at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:395) ...{code} Probably, the problem is spark generates unexplainable big Physical Plan - {code:java} val rdd = sparkContext.makeRDD(Seq(Row("1"))) val schema = StructType(Seq( StructField("c1", StringType) )) val df = sparkSession.createDataFrame(rdd, schema) val column = when(col("c1").isin("1"), "1").otherwise(col("c1")) val result = (1 to 9).foldLeft(df) { case (acc, _) => acc.withColumn("c1", column) } result.explain() {code} it shows a plan 18936 symbols length {code:java} == Physical Plan == *(1) Project [CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE .... 18936 symbols +- Scan ExistingRDD[c1#1] {code} was: The following code {code:java} val rdd = sparkContext.makeRDD(Seq(Row("1"))) val schema = StructType(Seq( StructField("c1", StringType) )) val df = sparkSession.createDataFrame(rdd, schema) val column = when(col("c1").isin("1"), "1").otherwise(col("c1")) (1 to 9).foldLeft(df) { case (acc, _) => val res = acc.withColumn("c1", column) res.take(1) res } {code} falls with {code:java} java.lang.StackOverflowError at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:395) ...{code} Probably, the problem is spark generates unexplainable big Physical Plan - {code:java} val rdd = sparkContext.makeRDD(Seq(Row("1"))) val schema = StructType(Seq( StructField("c1", StringType) )) val df = sparkSession.createDataFrame(rdd, schema) val column = when(col("c1").isin("1"), "1").otherwise(col("c1")) val result = (1 to 9).foldLeft(df) { case (acc, _) => acc.withColumn("c1", column) } result.explain() {code} it shows a plan 18936 symbols length {code:java} == Physical Plan == *(1) Project [CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE .... +- Scan ExistingRDD[c1#1] {code} > StackOverflowError when using otherwise(col()) in a loop > -------------------------------------------------------- > > Key: SPARK-28742 > URL: https://issues.apache.org/jira/browse/SPARK-28742 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.0, 2.4.3 > Reporter: Ivan Tsukanov > Priority: Major > > The following code > {code:java} > val rdd = sparkContext.makeRDD(Seq(Row("1"))) > val schema = StructType(Seq( > StructField("c1", StringType) > )) > val df = sparkSession.createDataFrame(rdd, schema) > val column = when(col("c1").isin("1"), "1").otherwise(col("c1")) > (1 to 9).foldLeft(df) { case (acc, _) => > val res = acc.withColumn("c1", column) > res.take(1) > res > } > {code} > falls with > {code:java} > java.lang.StackOverflowError > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:395) > ...{code} > Probably, the problem is spark generates unexplainable big Physical Plan - > {code:java} > val rdd = sparkContext.makeRDD(Seq(Row("1"))) > val schema = StructType(Seq( > StructField("c1", StringType) > )) > val df = sparkSession.createDataFrame(rdd, schema) > val column = when(col("c1").isin("1"), "1").otherwise(col("c1")) > val result = (1 to 9).foldLeft(df) { case (acc, _) => > acc.withColumn("c1", column) > } > result.explain() > {code} > it shows a plan 18936 symbols length > {code:java} > == Physical Plan == > *(1) Project [CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE > WHEN (CASE .... 18936 symbols > +- Scan ExistingRDD[c1#1] {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org