Repository: spark Updated Branches: refs/heads/master 5b761c537 -> 45c4ebc81
[SPARK-25170][DOC] Add list and short description of Spark Executor Task Metrics to the documentation. ## What changes were proposed in this pull request? Add description of Executor Task Metrics to the documentation. Closes #22397 from LucaCanali/docMonitoringTaskMetrics. Authored-by: LucaCanali <luca.can...@cern.ch> Signed-off-by: Sean Owen <sean.o...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/45c4ebc8 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/45c4ebc8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/45c4ebc8 Branch: refs/heads/master Commit: 45c4ebc8171d75fc0d169bb8071a4c43263d283e Parents: 5b761c5 Author: LucaCanali <luca.can...@cern.ch> Authored: Thu Sep 13 10:19:21 2018 -0500 Committer: Sean Owen <sean.o...@databricks.com> Committed: Thu Sep 13 10:19:21 2018 -0500 ---------------------------------------------------------------------- docs/monitoring.md | 152 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 152 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/45c4ebc8/docs/monitoring.md ---------------------------------------------------------------------- diff --git a/docs/monitoring.md b/docs/monitoring.md index 2717dd0..f6d52ef 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -388,6 +388,158 @@ value triggering garbage collection on jobs, and `spark.ui.retainedStages` that Note that the garbage collection takes place on playback: it is possible to retrieve more entries by increasing these values and restarting the history server. +### Executor Task Metrics + +The REST API exposes the values of the Task Metrics collected by Spark executors with the granularity +of task execution. The metrics can be used for performance troubleshooting and workload characterization. +A list of the available metrics, with a short description: + +<table class="table"> + <tr><th>Spark Executor Task Metric name</th> + <th>Short description</th> + </tr> + <tr> + <td>executorRunTime</td> + <td>Elapsed time the executor spent running this task. This includes time fetching shuffle data. + The value is expressed in milliseconds.</td> + </tr> + <tr> + <td>executorCpuTime</td> + <td>CPU time the executor spent running this task. This includes time fetching shuffle data. + The value is expressed in nanoseconds.</td> + </tr> + <tr> + <td>executorDeserializeTime</td> + <td>Elapsed time spent to deserialize this task. The value is expressed in milliseconds.</td> + </tr> + <tr> + <td>executorDeserializeCpuTime</td> + <td>CPU time taken on the executor to deserialize this task. The value is expressed + in nanoseconds.</td> + </tr> + <tr> + <td>resultSize</td> + <td>The number of bytes this task transmitted back to the driver as the TaskResult.</td> + </tr> + <tr> + <td>jvmGCTime</td> + <td>Elapsed time the JVM spent in garbage collection while executing this task. + The value is expressed in milliseconds.</td> + </tr> + <tr> + <td>resultSerializationTime</td> + <td>Elapsed time spent serializing the task result. The value is expressed in milliseconds.</td> + </tr> + <tr> + <td>memoryBytesSpilled</td> + <td>The number of in-memory bytes spilled by this task.</td> + </tr> + <tr> + <td>diskBytesSpilled</td> + <td>The number of on-disk bytes spilled by this task.</td> + </tr> + <tr> + <td>peakExecutionMemory</td> + <td>Peak memory used by internal data structures created during shuffles, aggregations and + joins. The value of this accumulator should be approximately the sum of the peak sizes + across all such data structures created in this task. For SQL jobs, this only tracks all + unsafe operators and ExternalSort.</td> + </tr> + <tr> + <td>inputMetrics.*</td> + <td>Metrics related to reading data from [[org.apache.spark.rdd.HadoopRDD]] + or from persisted data.</td> + </tr> + <tr> + <td> .bytesRead</td> + <td>Total number of bytes read.</td> + </tr> + <tr> + <td> .recordsRead</td> + <td>Total number of records read.</td> + </tr> + <tr> + <td>outputMetrics.*</td> + <td>Metrics related to writing data externally (e.g. to a distributed filesystem), + defined only in tasks with output.</td> + </tr> + <tr> + <td> .bytesWritten</td> + <td>Total number of bytes written</td> + </tr> + <tr> + <td> .recordsWritten</td> + <td>Total number of records written</td> + </tr> + <tr> + <td>shuffleReadMetrics.*</td> + <td>Metrics related to shuffle read operations.</td> + </tr> + <tr> + <td> .recordsRead</td> + <td>Number of records read in shuffle operations</td> + </tr> + <tr> + <td> .remoteBlocksFetched</td> + <td>Number of remote blocks fetched in shuffle operations</td> + </tr> + <tr> + <td> .localBlocksFetched</td> + <td>Number of local (as opposed to read from a remote executor) blocks fetched + in shuffle operations</td> + </tr> + <tr> + <td> .totalBlocksFetched</td> + <td>Number of blocks fetched in shuffle operations (both local and remote)</td> + </tr> + <tr> + <td> .remoteBytesRead</td> + <td>Number of remote bytes read in shuffle operations</td> + </tr> + <tr> + <td> .localBytesRead</td> + <td>Number of bytes read in shuffle operations from local disk (as opposed to + read from a remote executor)</td> + </tr> + <tr> + <td> .totalBytesRead</td> + <td>Number of bytes read in shuffle operations (both local and remote)</td> + </tr> + <tr> + <td> .remoteBytesReadToDisk</td> + <td>Number of remote bytes read to disk in shuffle operations. + Large blocks are fetched to disk in shuffle read operations, as opposed to + being read into memory, which is the default behavior.</td> + </tr> + <tr> + <td> .fetchWaitTime</td> + <td>Time the task spent waiting for remote shuffle blocks. + This only includes the time blocking on shuffle input data. + For instance if block B is being fetched while the task is still not finished + processing block A, it is not considered to be blocking on block B. + The value is expressed in milliseconds.</td> + </tr> + <tr> + <td>shuffleWriteMetrics.*</td> + <td>Metrics related to operations writing shuffle data.</td> + </tr> + <tr> + <td> .bytesWritten</td> + <td>Number of bytes written in shuffle operations</td> + </tr> + <tr> + <td> .recordsWritten</td> + <td>Number of records written in shuffle operations</td> + </tr> + <tr> + <td> .writeTime</td> + <td>Time spent blocking on writes to disk or buffer cache. The value is expressed + in nanoseconds.</td> + </tr> +</table> + + + ### API Versioning Policy These endpoints have been strongly versioned to make it easier to develop applications on top. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org