[ 
https://issues.apache.org/jira/browse/SPARK-17426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Zhong updated SPARK-17426:
-------------------------------
    Target Version/s: 2.1.0
         Description: 
In SPARK-17356, we fix the OOM issue when Metadata is super big. There are 
other cases that may also trigger OOM. Current implementation of 
TreeNode.toJSON will recursively search and print all fields of current 
TreeNode, even if the field's type is of type Seq or type Map. 

This is not safe because:
1. the Seq or Map can be very big. Converting them to JSON make take huge 
memory, which may trigger out of memory error.
2. Some user space input may also be propagated to the Plan. The input can be 
of arbitrary type, and may also be self-referencing. Trying to print user space 
to JSON input is very risky.

The following example triggers a StackOverflowError when calling toJSON on a 
plan with user defined UDF.
{code}

case class SelfReferenceUDF(
    var config: Map[String, Any] = Map.empty[String, Any]) extends 
Function1[String, Boolean] {
  config += "self" -> this
  def apply(key: String): Boolean = config.contains(key)
}

test("toJSON should not throws java.lang.StackOverflowError") {
  val udf = ScalaUDF(SelfReferenceUDF(), BooleanType, Seq("col1".attr))
  // triggers java.lang.StackOverflowError
  udf.toJSON
}

{code}

  was:
In SPARK-17356, we fix the OOM issue when {monospace}Metadata{monospace} is 
super big. There are other cases that may also trigger OOM. Current 
implementation of TreeNode.toJSON will recursively search and print all fields 
of current TreeNode, even if the field's type is of type Seq or type Map. 

This is not safe because:
1. the Seq or Map can be very big. Converting them to JSON make take huge 
memory, which may trigger out of memory error.
2. Some user space input may also be propagated to the Plan. The input can be 
of arbitrary type, and may also be self-referencing. Trying to print user space 
to JSON input is very risky.

The following example triggers a StackOverflowError when calling toJSON on a 
plan with user defined UDF.
{code}

case class SelfReferenceUDF(
    var config: Map[String, Any] = Map.empty[String, Any]) extends 
Function1[String, Boolean] {
  config += "self" -> this
  def apply(key: String): Boolean = config.contains(key)
}

test("toJSON should not throws java.lang.StackOverflowError") {
  val udf = ScalaUDF(SelfReferenceUDF(), BooleanType, Seq("col1".attr))
  // triggers java.lang.StackOverflowError
  udf.toJSON
}

{code}


> Current TreeNode.toJSON may trigger OOM under some corner cases
> ---------------------------------------------------------------
>
>                 Key: SPARK-17426
>                 URL: https://issues.apache.org/jira/browse/SPARK-17426
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Sean Zhong
>
> In SPARK-17356, we fix the OOM issue when Metadata is super big. There are 
> other cases that may also trigger OOM. Current implementation of 
> TreeNode.toJSON will recursively search and print all fields of current 
> TreeNode, even if the field's type is of type Seq or type Map. 
> This is not safe because:
> 1. the Seq or Map can be very big. Converting them to JSON make take huge 
> memory, which may trigger out of memory error.
> 2. Some user space input may also be propagated to the Plan. The input can be 
> of arbitrary type, and may also be self-referencing. Trying to print user 
> space to JSON input is very risky.
> The following example triggers a StackOverflowError when calling toJSON on a 
> plan with user defined UDF.
> {code}
> case class SelfReferenceUDF(
>     var config: Map[String, Any] = Map.empty[String, Any]) extends 
> Function1[String, Boolean] {
>   config += "self" -> this
>   def apply(key: String): Boolean = config.contains(key)
> }
> test("toJSON should not throws java.lang.StackOverflowError") {
>   val udf = ScalaUDF(SelfReferenceUDF(), BooleanType, Seq("col1".attr))
>   // triggers java.lang.StackOverflowError
>   udf.toJSON
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to