[ 
https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Baohe Zhang updated SPARK-33906:
--------------------------------
    Description: 
How to reproduce it?

In mac OS standalone mode, open a spark-shell and run

$SPARK_HOME/bin/spark-shell --master spark://localhost:7077
{code:scala}
val x = sc.makeRDD(1 to 100000, 5)
x.count()
{code}
Then open the app UI in the browser, and click the Executors page, will get 
stuck at this page: 

 !executor-page.png! 

Also the return JSON of REST API endpoint 
http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
miss "peakMemoryMetrics" for executors.
{noformat}
[ {
  "id" : "driver",
  "hostPort" : "192.168.1.241:50042",
  "isActive" : true,
  "rddBlocks" : 0,
  "memoryUsed" : 0,
  "diskUsed" : 0,
  "totalCores" : 0,
  "maxTasks" : 0,
  "activeTasks" : 0,
  "failedTasks" : 0,
  "completedTasks" : 0,
  "totalTasks" : 0,
  "totalDuration" : 0,
  "totalGCTime" : 0,
  "totalInputBytes" : 0,
  "totalShuffleRead" : 0,
  "totalShuffleWrite" : 0,
  "isBlacklisted" : false,
  "maxMemory" : 455501414,
  "addTime" : "2020-12-24T19:44:18.033GMT",
  "executorLogs" : { },
  "memoryMetrics" : {
    "usedOnHeapStorageMemory" : 0,
    "usedOffHeapStorageMemory" : 0,
    "totalOnHeapStorageMemory" : 455501414,
    "totalOffHeapStorageMemory" : 0
  },
  "blacklistedInStages" : [ ],
  "peakMemoryMetrics" : {
    "JVMHeapMemory" : 135021152,
    "JVMOffHeapMemory" : 149558576,
    "OnHeapExecutionMemory" : 0,
    "OffHeapExecutionMemory" : 0,
    "OnHeapStorageMemory" : 3301,
    "OffHeapStorageMemory" : 0,
    "OnHeapUnifiedMemory" : 3301,
    "OffHeapUnifiedMemory" : 0,
    "DirectPoolMemory" : 67963178,
    "MappedPoolMemory" : 0,
    "ProcessTreeJVMVMemory" : 0,
    "ProcessTreeJVMRSSMemory" : 0,
    "ProcessTreePythonVMemory" : 0,
    "ProcessTreePythonRSSMemory" : 0,
    "ProcessTreeOtherVMemory" : 0,
    "ProcessTreeOtherRSSMemory" : 0,
    "MinorGCCount" : 15,
    "MinorGCTime" : 101,
    "MajorGCCount" : 0,
    "MajorGCTime" : 0
  },
  "attributes" : { },
  "resources" : { },
  "resourceProfileId" : 0,
  "isExcluded" : false,
  "excludedInStages" : [ ]
}, {
  "id" : "0",
  "hostPort" : "192.168.1.241:50054",
  "isActive" : true,
  "rddBlocks" : 0,
  "memoryUsed" : 0,
  "diskUsed" : 0,
  "totalCores" : 12,
  "maxTasks" : 12,
  "activeTasks" : 0,
  "failedTasks" : 0,
  "completedTasks" : 5,
  "totalTasks" : 5,
  "totalDuration" : 2107,
  "totalGCTime" : 25,
  "totalInputBytes" : 0,
  "totalShuffleRead" : 0,
  "totalShuffleWrite" : 0,
  "isBlacklisted" : false,
  "maxMemory" : 455501414,
  "addTime" : "2020-12-24T19:44:20.335GMT",
  "executorLogs" : {
    "stdout" : 
"http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003&executorId=0&logType=stdout";,
    "stderr" : 
"http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003&executorId=0&logType=stderr";
  },
  "memoryMetrics" : {
    "usedOnHeapStorageMemory" : 0,
    "usedOffHeapStorageMemory" : 0,
    "totalOnHeapStorageMemory" : 455501414,
    "totalOffHeapStorageMemory" : 0
  },
  "blacklistedInStages" : [ ],
  "attributes" : { },
  "resources" : { },
  "resourceProfileId" : 0,
  "isExcluded" : false,
  "excludedInStages" : [ ]
} ]
{noformat}

I debugged it and observed that ExecutorMetricsPoller
.getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
None in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
 The possible reason for returning the empty map is that the stage completion 
time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
has already been removed before the reportHeartbeat is called.

  was:
How to reproduce it?

In mac OS standalone mode, open a spark-shell and run

$SPARK_HOME/bin/spark-shell --master spark://localhost:7077
{code:scala}
val x = sc.makeRDD(1 to 100000, 5)
x.count()
{code}
Then open the app UI in the browser, and click the Executors page, will get 
stuck at this page: 

!image-2020-12-24-14-12-22-983.png!

Also the return JSON of REST API endpoint 
http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
miss "peakMemoryMetrics" for executors.
{noformat}
[ {
  "id" : "driver",
  "hostPort" : "192.168.1.241:50042",
  "isActive" : true,
  "rddBlocks" : 0,
  "memoryUsed" : 0,
  "diskUsed" : 0,
  "totalCores" : 0,
  "maxTasks" : 0,
  "activeTasks" : 0,
  "failedTasks" : 0,
  "completedTasks" : 0,
  "totalTasks" : 0,
  "totalDuration" : 0,
  "totalGCTime" : 0,
  "totalInputBytes" : 0,
  "totalShuffleRead" : 0,
  "totalShuffleWrite" : 0,
  "isBlacklisted" : false,
  "maxMemory" : 455501414,
  "addTime" : "2020-12-24T19:44:18.033GMT",
  "executorLogs" : { },
  "memoryMetrics" : {
    "usedOnHeapStorageMemory" : 0,
    "usedOffHeapStorageMemory" : 0,
    "totalOnHeapStorageMemory" : 455501414,
    "totalOffHeapStorageMemory" : 0
  },
  "blacklistedInStages" : [ ],
  "peakMemoryMetrics" : {
    "JVMHeapMemory" : 135021152,
    "JVMOffHeapMemory" : 149558576,
    "OnHeapExecutionMemory" : 0,
    "OffHeapExecutionMemory" : 0,
    "OnHeapStorageMemory" : 3301,
    "OffHeapStorageMemory" : 0,
    "OnHeapUnifiedMemory" : 3301,
    "OffHeapUnifiedMemory" : 0,
    "DirectPoolMemory" : 67963178,
    "MappedPoolMemory" : 0,
    "ProcessTreeJVMVMemory" : 0,
    "ProcessTreeJVMRSSMemory" : 0,
    "ProcessTreePythonVMemory" : 0,
    "ProcessTreePythonRSSMemory" : 0,
    "ProcessTreeOtherVMemory" : 0,
    "ProcessTreeOtherRSSMemory" : 0,
    "MinorGCCount" : 15,
    "MinorGCTime" : 101,
    "MajorGCCount" : 0,
    "MajorGCTime" : 0
  },
  "attributes" : { },
  "resources" : { },
  "resourceProfileId" : 0,
  "isExcluded" : false,
  "excludedInStages" : [ ]
}, {
  "id" : "0",
  "hostPort" : "192.168.1.241:50054",
  "isActive" : true,
  "rddBlocks" : 0,
  "memoryUsed" : 0,
  "diskUsed" : 0,
  "totalCores" : 12,
  "maxTasks" : 12,
  "activeTasks" : 0,
  "failedTasks" : 0,
  "completedTasks" : 5,
  "totalTasks" : 5,
  "totalDuration" : 2107,
  "totalGCTime" : 25,
  "totalInputBytes" : 0,
  "totalShuffleRead" : 0,
  "totalShuffleWrite" : 0,
  "isBlacklisted" : false,
  "maxMemory" : 455501414,
  "addTime" : "2020-12-24T19:44:20.335GMT",
  "executorLogs" : {
    "stdout" : 
"http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003&executorId=0&logType=stdout";,
    "stderr" : 
"http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003&executorId=0&logType=stderr";
  },
  "memoryMetrics" : {
    "usedOnHeapStorageMemory" : 0,
    "usedOffHeapStorageMemory" : 0,
    "totalOnHeapStorageMemory" : 455501414,
    "totalOffHeapStorageMemory" : 0
  },
  "blacklistedInStages" : [ ],
  "attributes" : { },
  "resources" : { },
  "resourceProfileId" : 0,
  "isExcluded" : false,
  "excludedInStages" : [ ]
} ]
{noformat}

I debugged it and observed that ExecutorMetricsPoller
.getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
None in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
 The possible reason for returning the empty map is that the stage completion 
time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
has already been removed before the reportHeartbeat is called.


> SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-33906
>                 URL: https://issues.apache.org/jira/browse/SPARK-33906
>             Project: Spark
>          Issue Type: Bug
>          Components: Web UI
>    Affects Versions: 3.2.0
>            Reporter: Baohe Zhang
>            Priority: Major
>         Attachments: executor-page.png
>
>
> How to reproduce it?
> In mac OS standalone mode, open a spark-shell and run
> $SPARK_HOME/bin/spark-shell --master spark://localhost:7077
> {code:scala}
> val x = sc.makeRDD(1 to 100000, 5)
> x.count()
> {code}
> Then open the app UI in the browser, and click the Executors page, will get 
> stuck at this page: 
>  !executor-page.png! 
> Also the return JSON of REST API endpoint 
> http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
> miss "peakMemoryMetrics" for executors.
> {noformat}
> [ {
>   "id" : "driver",
>   "hostPort" : "192.168.1.241:50042",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 0,
>   "maxTasks" : 0,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 0,
>   "totalTasks" : 0,
>   "totalDuration" : 0,
>   "totalGCTime" : 0,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:18.033GMT",
>   "executorLogs" : { },
>   "memoryMetrics" : {
>     "usedOnHeapStorageMemory" : 0,
>     "usedOffHeapStorageMemory" : 0,
>     "totalOnHeapStorageMemory" : 455501414,
>     "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "peakMemoryMetrics" : {
>     "JVMHeapMemory" : 135021152,
>     "JVMOffHeapMemory" : 149558576,
>     "OnHeapExecutionMemory" : 0,
>     "OffHeapExecutionMemory" : 0,
>     "OnHeapStorageMemory" : 3301,
>     "OffHeapStorageMemory" : 0,
>     "OnHeapUnifiedMemory" : 3301,
>     "OffHeapUnifiedMemory" : 0,
>     "DirectPoolMemory" : 67963178,
>     "MappedPoolMemory" : 0,
>     "ProcessTreeJVMVMemory" : 0,
>     "ProcessTreeJVMRSSMemory" : 0,
>     "ProcessTreePythonVMemory" : 0,
>     "ProcessTreePythonRSSMemory" : 0,
>     "ProcessTreeOtherVMemory" : 0,
>     "ProcessTreeOtherRSSMemory" : 0,
>     "MinorGCCount" : 15,
>     "MinorGCTime" : 101,
>     "MajorGCCount" : 0,
>     "MajorGCTime" : 0
>   },
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> }, {
>   "id" : "0",
>   "hostPort" : "192.168.1.241:50054",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 12,
>   "maxTasks" : 12,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 5,
>   "totalTasks" : 5,
>   "totalDuration" : 2107,
>   "totalGCTime" : 25,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:20.335GMT",
>   "executorLogs" : {
>     "stdout" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003&executorId=0&logType=stdout";,
>     "stderr" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003&executorId=0&logType=stderr";
>   },
>   "memoryMetrics" : {
>     "usedOnHeapStorageMemory" : 0,
>     "usedOffHeapStorageMemory" : 0,
>     "totalOnHeapStorageMemory" : 455501414,
>     "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> } ]
> {noformat}
> I debugged it and observed that ExecutorMetricsPoller
> .getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
> None in 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
>  The possible reason for returning the empty map is that the stage completion 
> time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
> has already been removed before the reportHeartbeat is called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to