wangxiaojing123 commented on a change in pull request #664: KYLIN-4017 Build
engine get zk(zookeeper) lock failed when building job, it causes the whole
build engine doesn't work
URL: https://github.com/apache/kylin/pull/664#discussion_r290618823
##########
File path: core-common/src/main/java/org/apache/kylin/common/util/ZKUtil.java
##########
@@ -84,7 +84,7 @@ public void onRemoval(RemovalNotification<String,
CuratorFramework> notification
logger.error("Error at closing " + curator, ex);
}
}
- }).expireAfterWrite(1, TimeUnit.DAYS).build();
+ }).expireAfterWrite(10000, TimeUnit.DAYS).build();//never expired
Review comment:
> if the cache expire after 1 day,then will run curator.close(),in other
words the newZookeeperClient will closed, but the newZookeeperClient should be
as start state all the build engine lifecycle ,it used when build segment.if
newZookeeperClient.state!=start,it can't get zk lock ,can't build :
DistributedScheduler
```java
public void run() {
try (SetThreadName ignored = new SetThreadName("Scheduler %s Job
%s",
System.identityHashCode(DistributedScheduler.this),
executable.getId())) {
if (jobLock.lock(getLockPath(executable.getId()))) {
logger.info(executable.toString() + " scheduled in
server: " + serverName);
context.addRunningJob(executable);
jobWithLocks.add(executable.getId());
executable.execute(context);
}
} catch (ExecuteException e) {
logger.error("ExecuteException job:" + executable.getId() +
" in server: " + serverName, e);
} catch (Exception e) {
logger.error("unknown error execute job:" +
executable.getId() + " in server: " + serverName, e);
} finally {
context.removeRunningJob(executable);
releaseJobLock(executable);
// trigger the next step asap
fetcherPool.schedule(fetcher, 0, TimeUnit.SECONDS);
}
}
```
ZookeeperDistributedLock:
```java
public boolean lock(String lockPath) {
logger.debug("{} trying to lock {}", client, lockPath);
try {
curator.create().creatingParentsIfNeeded().withMode(CreateMode.EPHEMERAL).forPath(lockPath,
clientBytes);
} catch (KeeperException.NodeExistsException ex) {
logger.debug("{} see {} is already locked", client, lockPath);
} catch (Exception ex) {
throw new IllegalStateException("Error while " + client + "
trying to lock " + lockPath, ex);
}
String lockOwner = peekLock(lockPath);
if (client.equals(lockOwner)) {
logger.info("{} acquired lock at {}", client, lockPath);
return true;
} else {
logger.debug("{} failed to acquire lock at {}, which is held by
{}", client, lockPath, lockOwner);
return false;
}
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services