[ 
https://issues.apache.org/jira/browse/SPARK-33763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282201#comment-17282201
 ] 

Attila Zsolt Piros edited comment on SPARK-33763 at 2/10/21, 3:30 AM:
----------------------------------------------------------------------

I am not positive about the "stage re-submitted because of fetch failure" 
solution too as "stages.failedStages.count" is already available and most 
failed stages are retried.

When the tests on my PR (which contains the counter metrics for the different 
loss reasons) are finished I will reopen it as non-WIP PR (or remove the WIP 
label).


was (Author: attilapiros):
I am not positive about the "stage re-submitted because of fetch failure" 
solution too as "stages.failedStages.count" is already available and most 
failed stages are retried.

When the tests on my PR (which contains the counter metrics for the different 
loss reasons) are finished I will reopen it as non-WIP PR (or remove the WIP 
label).{{}}

> Add metrics for better tracking of dynamic allocation
> -----------------------------------------------------
>
>                 Key: SPARK-33763
>                 URL: https://issues.apache.org/jira/browse/SPARK-33763
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.2.0
>            Reporter: Holden Karau
>            Priority: Major
>
> We should add metrics to track the following:
> 1- Graceful decommissions & DA scheduled deletes
> 2- Jobs resubmitted
> 3- Fetch failures
> 4- Unexpected (e.g. non-Spark triggered) executor removals.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to