Github user kevinxu021 commented on a diff in the pull request:
https://github.com/apache/trafodion/pull/1427#discussion_r172729543
--- Diff: dcs/src/main/java/org/trafodion/dcs/master/DcsMaster.java ---
@@ -111,11 +104,59 @@ public DcsMaster(String[] args) {
trafodionHome = System.getProperty(Constants.DCS_TRAFODION_HOME);
jvmShutdownHook = new JVMShutdownHook();
Runtime.getRuntime().addShutdownHook(jvmShutdownHook);
- thrd = new Thread(this);
- thrd.start();
+
+ ExecutorService executorService = Executors.newFixedThreadPool(1);
+ CompletionService<Integer> completionService = new
ExecutorCompletionService<Integer>(executorService);
+
+ while (true) {
+ completionService.submit(this);
+ Future<Integer> f = null;
+ try {
+ f = completionService.take();
+ if (f != null) {
+ Integer status = f.get();
+ if (status <= 0) {
+ System.exit(status);
+ } else {
+ // 35000 * 15mins ~= 1 years
+ RetryCounter retryCounter =
RetryCounterFactory.create(35000, 15, TimeUnit.MINUTES);
+ while (true) {
+ try {
+ ZkClient tmpZkc = new ZkClient();
+ tmpZkc.connect();
+ tmpZkc.close();
+ tmpZkc = null;
+ LOG.info("Connected to ZooKeeper
successful, restart DCS Master.");
+ // reset lock
+ isLeader = new CountDownLatch(1);
+ break;
--- End diff --
As we discussed, Zookeeper connection lost has been covered by session
expired event, so this loop is useless.
---