[ 
https://issues.apache.org/jira/browse/TRAFODION-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354514#comment-16354514
 ] 

ASF GitHub Bot commented on TRAFODION-2940:
-------------------------------------------

Github user DaveBirdsall commented on a diff in the pull request:

    https://github.com/apache/trafodion/pull/1427#discussion_r166434382
  
    --- Diff: dcs/src/main/java/org/trafodion/dcs/master/DcsMaster.java ---
    @@ -111,11 +104,59 @@ public DcsMaster(String[] args) {
             trafodionHome = System.getProperty(Constants.DCS_TRAFODION_HOME);
             jvmShutdownHook = new JVMShutdownHook();
             Runtime.getRuntime().addShutdownHook(jvmShutdownHook);
    -        thrd = new Thread(this);
    -        thrd.start();
    +
    +        ExecutorService executorService = Executors.newFixedThreadPool(1);
    +        CompletionService<Integer> completionService = new 
ExecutorCompletionService<Integer>(executorService);
    +
    +        while (true) {
    +            completionService.submit(this);
    +            Future<Integer> f = null;
    +            try {
    +                f = completionService.take();
    +                if (f != null) {
    +                    Integer status = f.get();
    +                    if (status <= 0) {
    +                        System.exit(status);
    +                    } else {
    +                        // 35000 * 15mins ~= 1 years
    +                        RetryCounter retryCounter = 
RetryCounterFactory.create(35000, 15, TimeUnit.MINUTES);
    +                        while (true) {
    +                            try {
    +                                ZkClient tmpZkc = new ZkClient();
    +                                tmpZkc.connect();
    +                                tmpZkc.close();
    +                                tmpZkc = null;
    +                                LOG.info("Connected to ZooKeeper 
successful, restart DCS Master.");
    +                                // reset lock
    +                                isLeader = new CountDownLatch(1);
    +                                break;
    --- End diff --
    
    I'm not sure I understand this logic. Do we sit inside one of these method 
calls during normal processing? Or does tmpZkc.connect() and close() complete 
immediately? If so, it looks like we just loop around the while loop and do it 
again, over and over.


> In HA env, one node lose network, when recover, trafci can't use
> ----------------------------------------------------------------
>
>                 Key: TRAFODION-2940
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2940
>             Project: Apache Trafodion
>          Issue Type: Bug
>    Affects Versions: any
>            Reporter: mashengchen
>            Assignee: mashengchen
>            Priority: Major
>             Fix For: 2.3
>
>
> In HA env, if one node lose network for a long time , once network recover, 
> there will have two floating ip, two working dcs master, and trafci can't be 
> use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to