[
https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun resolved SPARK-33906.
-----------------------------------
Fix Version/s: 3.1.0
Resolution: Fixed
Issue resolved by pull request 30920
[https://github.com/apache/spark/pull/30920]
> SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset
> -----------------------------------------------------------------------------
>
> Key: SPARK-33906
> URL: https://issues.apache.org/jira/browse/SPARK-33906
> Project: Spark
> Issue Type: Bug
> Components: Web UI
> Affects Versions: 3.2.0
> Reporter: Baohe Zhang
> Assignee: Baohe Zhang
> Priority: Blocker
> Fix For: 3.1.0
>
> Attachments: executor-page.png
>
>
> How to reproduce it?
> In mac OS standalone mode, open a spark-shell and run
> $SPARK_HOME/bin/spark-shell --master spark://localhost:7077
> {code:scala}
> val x = sc.makeRDD(1 to 100000, 5)
> x.count()
> {code}
> Then open the app UI in the browser, and click the Executors page, will get
> stuck at this page:
> !executor-page.png!
> Also the return JSON of REST API endpoint
> http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors
> miss "peakMemoryMetrics" for executors.
> {noformat}
> [ {
> "id" : "driver",
> "hostPort" : "192.168.1.241:50042",
> "isActive" : true,
> "rddBlocks" : 0,
> "memoryUsed" : 0,
> "diskUsed" : 0,
> "totalCores" : 0,
> "maxTasks" : 0,
> "activeTasks" : 0,
> "failedTasks" : 0,
> "completedTasks" : 0,
> "totalTasks" : 0,
> "totalDuration" : 0,
> "totalGCTime" : 0,
> "totalInputBytes" : 0,
> "totalShuffleRead" : 0,
> "totalShuffleWrite" : 0,
> "isBlacklisted" : false,
> "maxMemory" : 455501414,
> "addTime" : "2020-12-24T19:44:18.033GMT",
> "executorLogs" : { },
> "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
> },
> "blacklistedInStages" : [ ],
> "peakMemoryMetrics" : {
> "JVMHeapMemory" : 135021152,
> "JVMOffHeapMemory" : 149558576,
> "OnHeapExecutionMemory" : 0,
> "OffHeapExecutionMemory" : 0,
> "OnHeapStorageMemory" : 3301,
> "OffHeapStorageMemory" : 0,
> "OnHeapUnifiedMemory" : 3301,
> "OffHeapUnifiedMemory" : 0,
> "DirectPoolMemory" : 67963178,
> "MappedPoolMemory" : 0,
> "ProcessTreeJVMVMemory" : 0,
> "ProcessTreeJVMRSSMemory" : 0,
> "ProcessTreePythonVMemory" : 0,
> "ProcessTreePythonRSSMemory" : 0,
> "ProcessTreeOtherVMemory" : 0,
> "ProcessTreeOtherRSSMemory" : 0,
> "MinorGCCount" : 15,
> "MinorGCTime" : 101,
> "MajorGCCount" : 0,
> "MajorGCTime" : 0
> },
> "attributes" : { },
> "resources" : { },
> "resourceProfileId" : 0,
> "isExcluded" : false,
> "excludedInStages" : [ ]
> }, {
> "id" : "0",
> "hostPort" : "192.168.1.241:50054",
> "isActive" : true,
> "rddBlocks" : 0,
> "memoryUsed" : 0,
> "diskUsed" : 0,
> "totalCores" : 12,
> "maxTasks" : 12,
> "activeTasks" : 0,
> "failedTasks" : 0,
> "completedTasks" : 5,
> "totalTasks" : 5,
> "totalDuration" : 2107,
> "totalGCTime" : 25,
> "totalInputBytes" : 0,
> "totalShuffleRead" : 0,
> "totalShuffleWrite" : 0,
> "isBlacklisted" : false,
> "maxMemory" : 455501414,
> "addTime" : "2020-12-24T19:44:20.335GMT",
> "executorLogs" : {
> "stdout" :
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003&executorId=0&logType=stdout",
> "stderr" :
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003&executorId=0&logType=stderr"
> },
> "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
> },
> "blacklistedInStages" : [ ],
> "attributes" : { },
> "resources" : { },
> "resourceProfileId" : 0,
> "isExcluded" : false,
> "excludedInStages" : [ ]
> } ]
> {noformat}
> I debugged it and observed that ExecutorMetricsPoller
> .getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to
> None in
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
> The possible reason for returning the empty map is that the stage completion
> time is shorter than the heartbeat interval, so the stage entry in stageTCMP
> has already been removed before the reportHeartbeat is called.
> How to fix it?
> Check if the peakMemoryMetrics is undefined in executorspage.js.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]