[
https://issues.apache.org/jira/browse/SPARK-10155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shixiong Zhu updated SPARK-10155:
---------------------------------
Description:
I saw a lot of `ThreadLocal` objects in the following app:
{code}
import org.apache.spark._
import org.apache.spark.sql._
object SparkApp {
def foo(sqlContext: SQLContext): Unit = {
import sqlContext.implicits._
sqlContext.sparkContext.parallelize(Seq("aaa", "bbb",
"ccc")).toDF().filter("length(_1) > 0").count()
}
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("sql-memory-leak")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
while (true) {
foo(sqlContext)
}
}
}
{code}
Running the above codes in a long time and finally it will OOM.
These ThreadLocal are from
"scala.util.parsing.combinator.Parsers.lastNoSuccessVar", which stores
`Failure("end of input", ...)`.
There is an issue in Scala here: https://issues.scala-lang.org/browse/SI-9010
and some discussions here: https://issues.scala-lang.org/browse/SI-4929
I tried to fix it using reflection to clear "lastNoSuccessVar" but failed
because of the complicated byte codes generated by Scala trait mixin.
Looks the best solution is reusing Parser?
was:
I saw a lot of `ThreadLocal` objects in the following app:
{code}
import org.apache.spark._
import org.apache.spark.sql._
object SparkApp {
def foo(sqlContext: SQLContext): Unit = {
import sqlContext.implicits._
sqlContext.sparkContext.parallelize(Seq("aaa", "bbb",
"ccc")).toDF().filter("length(_1) > 0").count()
}
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("sql-memory-leak")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
while (true) {
foo(sqlContext)
}
}
}
{code}
Running the above codes in a long time and finally it will OOM.
These ThreadLocal are from
"scala.util.parsing.combinator.Parsers.lastNoSuccessVar", which stores
`Failure("end of input", ...)`.
There is an issue in Scala here: https://issues.scala-lang.org/browse/SI-9010
and some discussions here: https://issues.scala-lang.org/browse/SI-4929
I tried to fix it using reflection but failed because of the complicated byte
codes generated by Scala trait mixin.
Looks the best solution is reusing Parser?
> Memory leak in SQL parsers
> --------------------------
>
> Key: SPARK-10155
> URL: https://issues.apache.org/jira/browse/SPARK-10155
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: Shixiong Zhu
> Priority: Critical
>
> I saw a lot of `ThreadLocal` objects in the following app:
> {code}
> import org.apache.spark._
> import org.apache.spark.sql._
> object SparkApp {
> def foo(sqlContext: SQLContext): Unit = {
> import sqlContext.implicits._
> sqlContext.sparkContext.parallelize(Seq("aaa", "bbb",
> "ccc")).toDF().filter("length(_1) > 0").count()
> }
> def main(args: Array[String]): Unit = {
> val conf = new SparkConf().setAppName("sql-memory-leak")
> val sc = new SparkContext(conf)
> val sqlContext = new SQLContext(sc)
> while (true) {
> foo(sqlContext)
> }
> }
> }
> {code}
> Running the above codes in a long time and finally it will OOM.
> These ThreadLocal are from
> "scala.util.parsing.combinator.Parsers.lastNoSuccessVar", which stores
> `Failure("end of input", ...)`.
> There is an issue in Scala here: https://issues.scala-lang.org/browse/SI-9010
> and some discussions here: https://issues.scala-lang.org/browse/SI-4929
> I tried to fix it using reflection to clear "lastNoSuccessVar" but failed
> because of the complicated byte codes generated by Scala trait mixin.
> Looks the best solution is reusing Parser?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]