[jira] [Updated] (SPARK-28742) StackOverflowError when using otherwise(col()) in a loop

Ivan Tsukanov (JIRA) Thu, 15 Aug 2019 01:38:14 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-28742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ivan Tsukanov updated SPARK-28742:
----------------------------------
    Description: 
The following code
{code:java}
val rdd = sparkContext.makeRDD(Seq(Row("1")))
val schema = StructType(Seq(
  StructField("c1", StringType)
))

val df = sparkSession.createDataFrame(rdd, schema)
val column = when(col("c1").isin("1"), "1").otherwise(col("c1"))

(1 to 9).foldLeft(df) { case (acc, _) =>
  val res = acc.withColumn("c1", column)
  res.take(1)
  res
}
{code}
falls with
{code:java}
java.lang.StackOverflowError
   at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:395)
   ...{code}
Probably, the problem is spark generates unexplainable big Physical Plan - 
{code:java}
val rdd = sparkContext.makeRDD(Seq(Row("1")))
val schema = StructType(Seq(
  StructField("c1", StringType)
))

val df = sparkSession.createDataFrame(rdd, schema)
val column = when(col("c1").isin("1"), "1").otherwise(col("c1"))

val result = (1 to 9).foldLeft(df) { case (acc, _) =>
  acc.withColumn("c1", column)
}
result.explain()
{code}
it shows a plan 18936 symbols length
{code:java}
== Physical Plan ==
*(1) Project [CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN 
(CASE .... 18936 symbols
+- Scan ExistingRDD[c1#1]  {code}

  was:
The following code
{code:java}
val rdd = sparkContext.makeRDD(Seq(Row("1")))
val schema = StructType(Seq(
  StructField("c1", StringType)
))

val df = sparkSession.createDataFrame(rdd, schema)
val column = when(col("c1").isin("1"), "1").otherwise(col("c1"))

(1 to 9).foldLeft(df) { case (acc, _) =>
  val res = acc.withColumn("c1", column)
  res.take(1)
  res
}
{code}
falls with
{code:java}
java.lang.StackOverflowError
   at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:395)
   ...{code}
Probably, the problem is spark generates unexplainable big Physical Plan - 
{code:java}
val rdd = sparkContext.makeRDD(Seq(Row("1")))
val schema = StructType(Seq(
  StructField("c1", StringType)
))

val df = sparkSession.createDataFrame(rdd, schema)
val column = when(col("c1").isin("1"), "1").otherwise(col("c1"))

val result = (1 to 9).foldLeft(df) { case (acc, _) =>
  acc.withColumn("c1", column)
}
result.explain()
{code}
it shows a plan 18936 symbols length
{code:java}
== Physical Plan ==
*(1) Project [CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN 
(CASE .... 
+- Scan ExistingRDD[c1#1]  {code}


> StackOverflowError when using otherwise(col()) in a loop
> --------------------------------------------------------
>
>                 Key: SPARK-28742
>                 URL: https://issues.apache.org/jira/browse/SPARK-28742
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0, 2.4.3
>            Reporter: Ivan Tsukanov
>            Priority: Major
>
> The following code
> {code:java}
> val rdd = sparkContext.makeRDD(Seq(Row("1")))
> val schema = StructType(Seq(
>   StructField("c1", StringType)
> ))
> val df = sparkSession.createDataFrame(rdd, schema)
> val column = when(col("c1").isin("1"), "1").otherwise(col("c1"))
> (1 to 9).foldLeft(df) { case (acc, _) =>
>   val res = acc.withColumn("c1", column)
>   res.take(1)
>   res
> }
> {code}
> falls with
> {code:java}
> java.lang.StackOverflowError
>    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:395)
>    ...{code}
> Probably, the problem is spark generates unexplainable big Physical Plan - 
> {code:java}
> val rdd = sparkContext.makeRDD(Seq(Row("1")))
> val schema = StructType(Seq(
>   StructField("c1", StringType)
> ))
> val df = sparkSession.createDataFrame(rdd, schema)
> val column = when(col("c1").isin("1"), "1").otherwise(col("c1"))
> val result = (1 to 9).foldLeft(df) { case (acc, _) =>
>   acc.withColumn("c1", column)
> }
> result.explain()
> {code}
> it shows a plan 18936 symbols length
> {code:java}
> == Physical Plan ==
> *(1) Project [CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE 
> WHEN (CASE .... 18936 symbols
> +- Scan ExistingRDD[c1#1]  {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-28742) StackOverflowError when using otherwise(col()) in a loop

Reply via email to