[
https://issues.apache.org/jira/browse/SPARK-25285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luca Canali updated SPARK-25285:
--------------------------------
Description:
The motivation for these additional metrics is to help in troubleshooting and
monitoring task execution on a cluster. Currently available metrics include
executor threadpool metrics for task completed and for active tasks. The
addition of threadpool tasStarted metric will allow for example to collect info
on the (approximate) number of failed tasks by computing the difference thread
started – (active threads + completed tasks and/or successfully finished tasks).
The proposed metric finishedTasks is also intended for this type of
troubleshooting. The difference between finshedTasks and
threadpool.completeTasks, is that the latter is a (dropwizard library) gauge
taken from the threadpool, while the former is a (dropwizard) counter computed
in the [[Executor]] class, when a task successfully finishes, together with
several other task metrics counters.
Note, there are similarities with some of the metrics introduced in
SPARK-24398, however there are key differences, coming from the fact that this
PR concerns the executor source, therefore providing metric values per executor
+ metric values do not require to pass through the listerner bus in this case.
was:
The motivation for these additional metrics is to help in troubleshooting
situations when tasks fail, are killed and/or restarted. Currently available
metrics include executor threadpool metrics for task completed and for active
tasks. The addition of threadpool tasStarted metric will allow for example to
collect info on the (approximate) number of failed tasks by computing the
difference thread started – (active threads + completed tasks and/or
successfully completed tasks).
The proposed metric successfulTasks is also intended for this type of
troubleshooting. The difference between successfulTasks and
threadpool.completeTasks, is that the latter is a (dropwizard library) gauge
taken from the threadpool, while the former is a (dropwizard) counter computed
in the [[Executor]] class, when a task successfully completes, together with
several other task metrics counters.
> Add executor task metrics to track the number of tasks started and of tasks
> successfully completed
> --------------------------------------------------------------------------------------------------
>
> Key: SPARK-25285
> URL: https://issues.apache.org/jira/browse/SPARK-25285
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 2.4.0
> Reporter: Luca Canali
> Priority: Minor
>
> The motivation for these additional metrics is to help in troubleshooting and
> monitoring task execution on a cluster. Currently available metrics include
> executor threadpool metrics for task completed and for active tasks. The
> addition of threadpool tasStarted metric will allow for example to collect
> info on the (approximate) number of failed tasks by computing the difference
> thread started – (active threads + completed tasks and/or successfully
> finished tasks).
> The proposed metric finishedTasks is also intended for this type of
> troubleshooting. The difference between finshedTasks and
> threadpool.completeTasks, is that the latter is a (dropwizard library) gauge
> taken from the threadpool, while the former is a (dropwizard) counter
> computed in the [[Executor]] class, when a task successfully finishes,
> together with several other task metrics counters.
> Note, there are similarities with some of the metrics introduced in
> SPARK-24398, however there are key differences, coming from the fact that
> this PR concerns the executor source, therefore providing metric values per
> executor + metric values do not require to pass through the listerner bus in
> this case.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]