[ 
https://issues.apache.org/jira/browse/SPARK-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-7353.
------------------------------
    Resolution: Invalid

General "why isn't this working?" questions are not valid JIRAs. Please review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

However, you're not reporting a problem, it seems. There's nothing abnormal 
about the heap growing. Until the GC kicks in -- and it won't until there's 
memory pressure -- there could be lots of garbage around.

> Driver memory leak?
> -------------------
>
>                 Key: SPARK-7353
>                 URL: https://issues.apache.org/jira/browse/SPARK-7353
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.3.1
>            Reporter: yangping wu
>
> Hi all, I am using Spark Streaming to read data from kafka, My spark version 
> is 1.3.1,the code as follow:
> {code}
> object Test {
>   def main(args: Array[String]) {
>     val brokerAddress = 
> "192.168.246.66:9092,192.168.246.67:9092,192.168.246.68:9092"
>     val kafkaParams = Map[String, String](
>       "metadata.broker.list" -> brokerAddress,
>       "group.id" -> args(0))
>     val sparkConf = new SparkConf().setAppName("Test")
>     sparkConf.set("spark.kryo.registrator", "utils.MyKryoSerializer")
>     val sc = new SparkContext(sparkConf)
>     val ssc = new StreamingContext(sc, Seconds(2))
>     val topicsSet = Set("sparktopic")
>     val messages = KafkaUtils.createDirectStream[String, String, 
> StringDecoder, StringDecoder](ssc, kafkaParams, topicsSet)
>     messages.foreachRDD(rdd =>{
>       if(!rdd.isEmpty()){
>         rdd.count()
>       }
>     })
>     ssc.start()
>     ssc.awaitTermination()
>     ssc.stop()
>   }
> }
> {code}
> The program already run about 120 hours, below is *jmap -histo:live* result 
> for the program:
> {code}
> num     #instances         #bytes  class name
> ----------------------------------------------
>    1:         30148      139357920  [B
>    2:       2102205       67270560  java.util.HashMap$Entry
>    3:       2143056       51433344  java.lang.Long
>    4:        520430       26570456  [C
>    5:        119224       15271104  <methodKlass>
>    6:        119224       14747984  <constMethodKlass>
>    7:          3449       13476384  [Ljava.util.HashMap$Entry;
>    8:        519132       12459168  java.lang.String
>    9:          9680       10855744  <constantPoolKlass>
>   10:          9680        9358856  <instanceKlassKlass>
>   11:        282624        6782976  
> io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry
>   12:          8137        5778112  <constantPoolCacheKlass>
>   13:           120        3934080  [Lscala.concurrent.forkjoin.ForkJoinTask;
>   14:         71166        2846640  java.util.TreeMap$Entry
>   15:          6425        2545712  <methodDataKlass>
>   16:         10308        1224792  java.lang.Class
>   17:           640        1140736  
> [Lio.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry;
>   18:         22087        1060176  java.util.TreeMap
>   19:         19337        1014288  [[I
>   20:         16327         916376  [S
>   21:         17481         559392  
> java.util.concurrent.ConcurrentHashMap$HashEntry
>   22:          2235         548480  [I
>   23:         22000         528000  
> javax.management.openmbean.CompositeDataSupport
> {code}
> ([The jmap result 
> screenshot|https://cloud.githubusercontent.com/assets/5170878/7465993/c9fc5b24-f30d-11e4-9276-ae635f850833.jpg])Note
>  the  *java.util.HashMap$Entry* and  *java.lang.Long* object, There are 
> already using about 120MB! and  I found, as time goes by, the 
> *java.util.HashMap$Entry* and *java.lang.Long* object will occupied more and 
> more memory,   and this will cause OOM on driver side. But I don't know what 
> component cause this problem.
> The other program that has many job in one batch interval, and the *jmap 
> -histo:live* result for the program:
> {code}
> num     #instances         #bytes  class name
> ----------------------------------------------
>    1:       5256952      168222464  java.util.HashMap$Entry
>    2:         53317      144304808  [B
>    3:       5185499      124451976  java.lang.Long
>    4:          2456       39707888  [Ljava.util.HashMap$Entry;
>    5:        127343       16310384  <methodKlass>
>    6:        127343       15745680  <constMethodKlass>
>    7:         10403       11696960  <constantPoolKlass>
>    8:         10403       10103520  <instanceKlassKlass>
>    9:         69955       10046040  [Ljava.lang.Object;
>   10:        122628        7963480  [C
>   11:         72208        7458496  [Lscala.collection.mutable.HashEntry;
>   12:        282624        6782976  
> io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry
>   13:          8824        6326208  <constantPoolCacheKlass>
>   14:        138037        5521480  org.apache.spark.storage.BlockStatus
>   15:        209132        5019168  scala.collection.mutable.DefaultEntry
>   16:           140        4589760  [Lscala.concurrent.forkjoin.ForkJoinTask;
>   17:          6591        3509992  [I
>   18:          8454        3275104  <methodDataKlass>
>   19:        135468        3251232  org.apache.spark.storage.RDDBlockId
>   20:        121176        2908224  java.lang.String
>   21:         72207        2888280  scala.collection.mutable.HashMap
>   22:         65714        2628560  scala.collection.mutable.HashSet
>   23:         17935        2008720  
> org.apache.spark.ui.jobs.UIData$ExecutorSummary
>   24:        121144        1938304  java.lang.Integer
>   25:         17987        1870648  org.apache.spark.executor.TaskMetrics
>   26:         65268        1566432  org.apache.spark.rdd.ShuffledRDDPartition
>   27:         11094        1319248  java.lang.Class
>   28:         17996        1295712  org.apache.spark.scheduler.TaskInfo
>   29:         37067        1186144  scala.Tuple4
>   30:           640        1140736  
> [Lio.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry;
>   31:         21033        1090832  [[I
>   32:         43517        1044408  scala.Tuple2
>   33:         17645         992808  [S
>   34:         40676         976224  scala.collection.immutable.$colon$colon
>   35:         60064         961024  scala.Some
>   36:         16118         902608  org.apache.spark.storage.RDDInfo
> {code}
> the *java.util.HashMap$Entry* and *java.lang.Long* object using 280MB memory!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to