yangping wu created SPARK-7353:
----------------------------------

             Summary: Driver memory leak?
                 Key: SPARK-7353
                 URL: https://issues.apache.org/jira/browse/SPARK-7353
             Project: Spark
          Issue Type: Bug
          Components: Streaming
    Affects Versions: 1.3.1
            Reporter: yangping wu


Hi all, I am using Spark Streaming to read data from kafka, My spark version is 
1.3.1,the code as follow:
{code}
object Test {
  def main(args: Array[String]) {
    val brokerAddress = 
"192.168.246.66:9092,192.168.246.67:9092,192.168.246.68:9092"
    val kafkaParams = Map[String, String](
      "metadata.broker.list" -> brokerAddress,
      "group.id" -> args(0))

    val sparkConf = new SparkConf().setAppName("Test")
    sparkConf.set("spark.kryo.registrator", "utils.MyKryoSerializer")
    val sc = new SparkContext(sparkConf)

    val ssc = new StreamingContext(sc, Seconds(2))
    val topicsSet = Set("sparktopic")

    val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, 
StringDecoder](ssc, kafkaParams, topicsSet)
    messages.foreachRDD(rdd =>{
      if(!rdd.isEmpty()){
        rdd.count()
      }
    })

    ssc.start()
    ssc.awaitTermination()
    ssc.stop()
  }
}
{code}
The program already run about 120 hours, below is *jmap -histo:live* result for 
the program:
{code}
num     #instances         #bytes  class name
----------------------------------------------
   1:         30148      139357920  [B
   2:       2102205       67270560  java.util.HashMap$Entry
   3:       2143056       51433344  java.lang.Long
   4:        520430       26570456  [C
   5:        119224       15271104  <methodKlass>
   6:        119224       14747984  <constMethodKlass>
   7:          3449       13476384  [Ljava.util.HashMap$Entry;
   8:        519132       12459168  java.lang.String
   9:          9680       10855744  <constantPoolKlass>
  10:          9680        9358856  <instanceKlassKlass>
  11:        282624        6782976  
io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry
  12:          8137        5778112  <constantPoolCacheKlass>
  13:           120        3934080  [Lscala.concurrent.forkjoin.ForkJoinTask;
  14:         71166        2846640  java.util.TreeMap$Entry
  15:          6425        2545712  <methodDataKlass>
  16:         10308        1224792  java.lang.Class
  17:           640        1140736  
[Lio.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry;
  18:         22087        1060176  java.util.TreeMap
  19:         19337        1014288  [[I
  20:         16327         916376  [S
  21:         17481         559392  
java.util.concurrent.ConcurrentHashMap$HashEntry
  22:          2235         548480  [I
  23:         22000         528000  
javax.management.openmbean.CompositeDataSupport
{code}
([The jmap result 
screenshot|https://cloud.githubusercontent.com/assets/5170878/7465993/c9fc5b24-f30d-11e4-9276-ae635f850833.jpg])Note
 the  *java.util.HashMap$Entry* and  *java.lang.Long* object, There are already 
using about 120MB! and  I found, as time goes by, the *java.util.HashMap$Entry* 
and *java.lang.Long* object will occupied more and more memory,   and this will 
cause OOM on driver side. But I don't know what component cause this problem.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to