[ 
https://issues.apache.org/jira/browse/TRAFODION-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063633#comment-16063633
 ] 

ASF GitHub Bot commented on TRAFODION-2668:
-------------------------------------------

Github user selvaganesang commented on a diff in the pull request:

    
https://github.com/apache/incubator-trafodion/pull/1144#discussion_r124095950
  
    --- Diff: dcs/src/main/java/org/trafodion/dcs/server/ServerManager.java ---
    @@ -414,32 +431,40 @@ public Boolean call() throws Exception {
                                                                  // finish
                     if (f != null) {
                         Integer result = f.get();
    +                    LOG.debug("Server handler [" + instance + ":" + result 
+ "] finished");
                         int childInstance = result.intValue();
                         // get the node id
    -                    boolean isRunning = 
serverHandlers[childInstance-1].serverMonitor.isPidRunning();
    -                    String nid = 
serverHandlers[childInstance-1].serverMonitor.nid;
    -                    String pid = 
serverHandlers[childInstance-1].serverMonitor.pid; 
    -                    serverHandlers[childInstance-1] = null;
    -                    LOG.debug("Server handler [" + instance + ":" + result
    -                            + "] finished, restarting");
    -                    if (isRunning)
    -                        LOG.info("mxosrvr " + nid + "," + pid + " still 
running");
    -                    else
    -                        LOG.info("mxosrvr " + nid + "," + pid + " exited, 
restarting");
    +                    boolean isRunning = serverHandlers[childInstance - 
1].serverMonitor.monitor();
    +                    String nid = serverHandlers[childInstance - 
1].serverMonitor.nid;
    +                    String pid = serverHandlers[childInstance - 
1].serverMonitor.pid;
    +                    int restartAttempts = serverHandlers[childInstance - 
1].getRestartAttempts();
    +
    +                    serverHandlers[childInstance - 1] = null;
                         retryCounter = retryCounterFactory.create();
                         while (!isTrafodionRunning(nid)) {
    -                       if (!retryCounter.shouldRetry()) {
    -                          throw new IOException("Node " + nid + " is not 
Up");
    -                       } else {
    -                           retryCounter.sleepUntilNextRetry();
    -                           retryCounter.useRetry();
    -                       }
    -                   }
    -                   serverHandlers[childInstance-1] = new 
ServerHandler(childInstance);
    -                   
completionService.submit(serverHandlers[childInstance-1]);
    +                        if (!retryCounter.shouldRetry()) {
    --- End diff --
    
    I think you need to fix retryCounterFactory to do the looping for retry 
count or get rid of retryCounter completely and use the restartAttempt 
variable.  For the latter case, the line no 445 shouldn't be using 
retryCounter.shouldRetry() method.
    
    How was this change unit tested?


> one mxosrvr can't startup, dcsServer down
> -----------------------------------------
>
>                 Key: TRAFODION-2668
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2668
>             Project: Apache Trafodion
>          Issue Type: Bug
>            Reporter: mashengchen
>            Assignee: mashengchen
>
> if one of the mxosrvr have any error, while dscserver is starting, dcsserver 
> will down



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to