Shixiong Zhu created SPARK-27468:
------------------------------------

             Summary: "Storage Level" in "RDD Storage Page" is not correct
                 Key: SPARK-27468
                 URL: https://issues.apache.org/jira/browse/SPARK-27468
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.4.1
            Reporter: Shixiong Zhu


I ran the following unit test and checked the UI.
{code}
    val conf = new SparkConf()
      .setAppName("test")
      .setMaster("local-cluster[2,1,1024]")
      .set("spark.ui.enabled", "true")
    sc = new SparkContext(conf)
    val rdd = sc.makeRDD(1 to 10, 1).persist(StorageLevel.MEMORY_ONLY_2)
    rdd.count()
    Thread.sleep(3600000)
{code}

The storage level is "Memory Deserialized 1x Replicated" in the RDD storage 
page.

I tried to debug and found this is because Spark emitted the following two 
events:
{code}
event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, 
10.8.132.160, 65473, None),rdd_0_0,StorageLevel(memory, deserialized, 2 
replicas),56,0))
event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, 
10.8.132.160, 65474, None),rdd_0_0,StorageLevel(memory, deserialized, 1 
replicas),56,0))
{code}

The storage level in the second event will overwrite the first one. "1 
replicas" comes from this line: 
https://github.com/apache/spark/blob/3ab96d7acf870e53c9016b0b63d0b328eec23bed/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1457

Maybe AppStatusListener should calculate the replicas from events?

Another fact we may need to think about is when replicas is 2, will two Spark 
events arrive in the same order? Currently, two RPCs from different executors 
can arrive in any order.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to