[
https://issues.apache.org/jira/browse/SPARK-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-7353.
------------------------------
Resolution: Invalid
General "why isn't this working?" questions are not valid JIRAs. Please review
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
However, you're not reporting a problem, it seems. There's nothing abnormal
about the heap growing. Until the GC kicks in -- and it won't until there's
memory pressure -- there could be lots of garbage around.
> Driver memory leak?
> -------------------
>
> Key: SPARK-7353
> URL: https://issues.apache.org/jira/browse/SPARK-7353
> Project: Spark
> Issue Type: Bug
> Components: Streaming
> Affects Versions: 1.3.1
> Reporter: yangping wu
>
> Hi all, I am using Spark Streaming to read data from kafka, My spark version
> is 1.3.1,the code as follow:
> {code}
> object Test {
> def main(args: Array[String]) {
> val brokerAddress =
> "192.168.246.66:9092,192.168.246.67:9092,192.168.246.68:9092"
> val kafkaParams = Map[String, String](
> "metadata.broker.list" -> brokerAddress,
> "group.id" -> args(0))
> val sparkConf = new SparkConf().setAppName("Test")
> sparkConf.set("spark.kryo.registrator", "utils.MyKryoSerializer")
> val sc = new SparkContext(sparkConf)
> val ssc = new StreamingContext(sc, Seconds(2))
> val topicsSet = Set("sparktopic")
> val messages = KafkaUtils.createDirectStream[String, String,
> StringDecoder, StringDecoder](ssc, kafkaParams, topicsSet)
> messages.foreachRDD(rdd =>{
> if(!rdd.isEmpty()){
> rdd.count()
> }
> })
> ssc.start()
> ssc.awaitTermination()
> ssc.stop()
> }
> }
> {code}
> The program already run about 120 hours, below is *jmap -histo:live* result
> for the program:
> {code}
> num #instances #bytes class name
> ----------------------------------------------
> 1: 30148 139357920 [B
> 2: 2102205 67270560 java.util.HashMap$Entry
> 3: 2143056 51433344 java.lang.Long
> 4: 520430 26570456 [C
> 5: 119224 15271104 <methodKlass>
> 6: 119224 14747984 <constMethodKlass>
> 7: 3449 13476384 [Ljava.util.HashMap$Entry;
> 8: 519132 12459168 java.lang.String
> 9: 9680 10855744 <constantPoolKlass>
> 10: 9680 9358856 <instanceKlassKlass>
> 11: 282624 6782976
> io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry
> 12: 8137 5778112 <constantPoolCacheKlass>
> 13: 120 3934080 [Lscala.concurrent.forkjoin.ForkJoinTask;
> 14: 71166 2846640 java.util.TreeMap$Entry
> 15: 6425 2545712 <methodDataKlass>
> 16: 10308 1224792 java.lang.Class
> 17: 640 1140736
> [Lio.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry;
> 18: 22087 1060176 java.util.TreeMap
> 19: 19337 1014288 [[I
> 20: 16327 916376 [S
> 21: 17481 559392
> java.util.concurrent.ConcurrentHashMap$HashEntry
> 22: 2235 548480 [I
> 23: 22000 528000
> javax.management.openmbean.CompositeDataSupport
> {code}
> ([The jmap result
> screenshot|https://cloud.githubusercontent.com/assets/5170878/7465993/c9fc5b24-f30d-11e4-9276-ae635f850833.jpg])Note
> the *java.util.HashMap$Entry* and *java.lang.Long* object, There are
> already using about 120MB! and I found, as time goes by, the
> *java.util.HashMap$Entry* and *java.lang.Long* object will occupied more and
> more memory, and this will cause OOM on driver side. But I don't know what
> component cause this problem.
> The other program that has many job in one batch interval, and the *jmap
> -histo:live* result for the program:
> {code}
> num #instances #bytes class name
> ----------------------------------------------
> 1: 5256952 168222464 java.util.HashMap$Entry
> 2: 53317 144304808 [B
> 3: 5185499 124451976 java.lang.Long
> 4: 2456 39707888 [Ljava.util.HashMap$Entry;
> 5: 127343 16310384 <methodKlass>
> 6: 127343 15745680 <constMethodKlass>
> 7: 10403 11696960 <constantPoolKlass>
> 8: 10403 10103520 <instanceKlassKlass>
> 9: 69955 10046040 [Ljava.lang.Object;
> 10: 122628 7963480 [C
> 11: 72208 7458496 [Lscala.collection.mutable.HashEntry;
> 12: 282624 6782976
> io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry
> 13: 8824 6326208 <constantPoolCacheKlass>
> 14: 138037 5521480 org.apache.spark.storage.BlockStatus
> 15: 209132 5019168 scala.collection.mutable.DefaultEntry
> 16: 140 4589760 [Lscala.concurrent.forkjoin.ForkJoinTask;
> 17: 6591 3509992 [I
> 18: 8454 3275104 <methodDataKlass>
> 19: 135468 3251232 org.apache.spark.storage.RDDBlockId
> 20: 121176 2908224 java.lang.String
> 21: 72207 2888280 scala.collection.mutable.HashMap
> 22: 65714 2628560 scala.collection.mutable.HashSet
> 23: 17935 2008720
> org.apache.spark.ui.jobs.UIData$ExecutorSummary
> 24: 121144 1938304 java.lang.Integer
> 25: 17987 1870648 org.apache.spark.executor.TaskMetrics
> 26: 65268 1566432 org.apache.spark.rdd.ShuffledRDDPartition
> 27: 11094 1319248 java.lang.Class
> 28: 17996 1295712 org.apache.spark.scheduler.TaskInfo
> 29: 37067 1186144 scala.Tuple4
> 30: 640 1140736
> [Lio.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry;
> 31: 21033 1090832 [[I
> 32: 43517 1044408 scala.Tuple2
> 33: 17645 992808 [S
> 34: 40676 976224 scala.collection.immutable.$colon$colon
> 35: 60064 961024 scala.Some
> 36: 16118 902608 org.apache.spark.storage.RDDInfo
> {code}
> the *java.util.HashMap$Entry* and *java.lang.Long* object using 280MB memory!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]