[ 
https://issues.apache.org/jira/browse/TWILL-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852063#comment-15852063
 ] 

ASF GitHub Bot commented on TWILL-181:
--------------------------------------

Github user hsaputra commented on a diff in the pull request:

    https://github.com/apache/twill/pull/23#discussion_r99416151
  
    --- Diff: 
twill-yarn/src/main/java/org/apache/twill/internal/appmaster/RunningContainers.java
 ---
    @@ -477,10 +493,30 @@ void handleCompleted(YarnContainerStatus status, 
Multiset<String> restartRunnabl
         }
       }
     
    -  private boolean shouldRetry(int exitCode) {
    -    return exitCode != ContainerExitCodes.SUCCESS
    -      && exitCode != ContainerExitCodes.DISKS_FAILED
    -      && exitCode != ContainerExitCodes.INIT_FAILED;
    +  private boolean shouldRetry(String runnableName, int instanceId, int 
exitCode) {
    +    boolean possiblyRetry = 
    +        exitCode != ContainerExitCodes.SUCCESS && 
    +        exitCode != ContainerExitCodes.DISKS_FAILED && 
    +        exitCode != ContainerExitCodes.INIT_FAILED;
    +    
    +    if (possiblyRetry) {
    +      int max = getMaxRetries(runnableName);
    +      if (max == Integer.MAX_VALUE) {
    +        return true; // retry without special log msg
    +      }
    +
    +      if (getRetryCount(runnableName, instanceId) == max) {
    --- End diff --
    
    Since we call `getRetryCount` for this check, might as well cache it as 
local var to make sure we got  the right one for this request:
    
    ```
    int retryCount = getRetryCount(runnableName, instanceId);
    if (getRetryCount(runnableName, instanceId) == max) {
      ...
    } else {
      LOG.info("Attempting {} of {} retries for instance {} of runnable {}.", 
retryCount + 1,
        max, instanceId, runnableName);
      return true;
    }
    ```



> Control the maximum number of retries for failed application starts
> -------------------------------------------------------------------
>
>                 Key: TWILL-181
>                 URL: https://issues.apache.org/jira/browse/TWILL-181
>             Project: Apache Twill
>          Issue Type: Improvement
>          Components: yarn
>    Affects Versions: 0.7.0-incubating
>            Reporter: Martin Serrano
>            Assignee: Martin Serrano
>             Fix For: 0.10.0
>
>
> If an application consistently exits with a non-zero code,  twill will 
> attempt to restart indefinitely.  I ran into this issue and a list search 
> also reveals [others|  http://markmail.org/message/dehx7r6tpqgcmjh4].  
> There should be a mechanism to specify the maximum number of retries until 
> the application fails.  Ideally by default there would be a non-infinite 
> maximum.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to