[ 
https://issues.apache.org/jira/browse/TRAFODION-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16388986#comment-16388986
 ] 

ASF GitHub Bot commented on TRAFODION-2940:
-------------------------------------------

Github user kevinxu021 commented on a diff in the pull request:

    https://github.com/apache/trafodion/pull/1427#discussion_r172729543
  
    --- Diff: dcs/src/main/java/org/trafodion/dcs/master/DcsMaster.java ---
    @@ -111,11 +104,59 @@ public DcsMaster(String[] args) {
             trafodionHome = System.getProperty(Constants.DCS_TRAFODION_HOME);
             jvmShutdownHook = new JVMShutdownHook();
             Runtime.getRuntime().addShutdownHook(jvmShutdownHook);
    -        thrd = new Thread(this);
    -        thrd.start();
    +
    +        ExecutorService executorService = Executors.newFixedThreadPool(1);
    +        CompletionService<Integer> completionService = new 
ExecutorCompletionService<Integer>(executorService);
    +
    +        while (true) {
    +            completionService.submit(this);
    +            Future<Integer> f = null;
    +            try {
    +                f = completionService.take();
    +                if (f != null) {
    +                    Integer status = f.get();
    +                    if (status <= 0) {
    +                        System.exit(status);
    +                    } else {
    +                        // 35000 * 15mins ~= 1 years
    +                        RetryCounter retryCounter = 
RetryCounterFactory.create(35000, 15, TimeUnit.MINUTES);
    +                        while (true) {
    +                            try {
    +                                ZkClient tmpZkc = new ZkClient();
    +                                tmpZkc.connect();
    +                                tmpZkc.close();
    +                                tmpZkc = null;
    +                                LOG.info("Connected to ZooKeeper 
successful, restart DCS Master.");
    +                                // reset lock
    +                                isLeader = new CountDownLatch(1);
    +                                break;
    --- End diff --
    
    As we discussed, Zookeeper connection lost has been covered by session 
expired event, so this loop is useless.


> In HA env, one node lose network, when recover, trafci can't use
> ----------------------------------------------------------------
>
>                 Key: TRAFODION-2940
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2940
>             Project: Apache Trafodion
>          Issue Type: Bug
>    Affects Versions: any
>            Reporter: mashengchen
>            Assignee: mashengchen
>            Priority: Major
>             Fix For: 2.3
>
>
> In HA env, if one node lose network for a long time , once network recover, 
> there will have two floating ip, two working dcs master, and trafci can't be 
> use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to