[ https://issues.apache.org/jira/browse/SPARK-14113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-14113: ------------------------------------ Assignee: (was: Apache Spark) > Consider marking JobConf closure-cleaning in HadoopRDD as optional > ------------------------------------------------------------------ > > Key: SPARK-14113 > URL: https://issues.apache.org/jira/browse/SPARK-14113 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Reporter: Rajesh Balamohan > Priority: Minor > > In HadoopRDD, the following code was introduced as a part of SPARK-6943. > {noformat} > if (initLocalJobConfFuncOpt.isDefined) { > sparkContext.clean(initLocalJobConfFuncOpt.get) > } > {noformat} > When working on one of the changes in OrcRelation, I tried passing > initLocalJobConfFuncOpt to HadoopRDD and that incurred good performance > penalty (due to closure cleaning) with large RDDs. This would be invoked for > every HadoopRDD initialization causing the bottleneck. > example threadstack is given below > {noformat} > at org.apache.xbean.asm5.ClassReader.a(Unknown Source) > at org.apache.xbean.asm5.ClassReader.readUTF8(Unknown Source) > at org.apache.xbean.asm5.ClassReader.a(Unknown Source) > at org.apache.xbean.asm5.ClassReader.accept(Unknown Source) > at org.apache.xbean.asm5.ClassReader.accept(Unknown Source) > at > org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:402) > at > org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:390) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at > scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:102) > at > scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:102) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) > at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:102) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at > org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:390) > at org.apache.xbean.asm5.ClassReader.a(Unknown Source) > at org.apache.xbean.asm5.ClassReader.b(Unknown Source) > at org.apache.xbean.asm5.ClassReader.accept(Unknown Source) > at org.apache.xbean.asm5.ClassReader.accept(Unknown Source) > at > org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$15.apply(ClosureCleaner.scala:224) > at > org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$15.apply(ClosureCleaner.scala:223) > at scala.collection.immutable.List.foreach(List.scala:318) > at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:223) > at > org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2079) > at > org.apache.spark.rdd.HadoopRDD.<init>(HadoopRDD.scala:112){noformat} > Creating this JIRA to explore the possibility of removing it or mark it > optional. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org