Gour Saha created SLIDER-930:
--------------------------------

             Summary: Incorporate Yarn feature of resetting AM failure count 
into Slider AM
                 Key: SLIDER-930
                 URL: https://issues.apache.org/jira/browse/SLIDER-930
             Project: Slider
          Issue Type: Bug
          Components: appmaster
    Affects Versions: Slider 0.80
            Reporter: Gour Saha


YARN-611 provides this feature. Currently Slider apps are bound by the number 
set for yarn.resourcemanager.am.max-retries in the cluster. By default this 
value is set to 2, which is very low for long running services.

Slider AM should use the feature provided in YARN-611 and set a interval after 
which the failure count will be reset to 0.

I believe the API to call on ApplicationSubmissionContext is 
attemptFailuresValidityInterval. To start with Slider can set it to 5 mins 
which should be a reasonable default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to