ctubbsii commented on code in PR #5315:
URL: https://github.com/apache/accumulo/pull/5315#discussion_r1947456390
##########
minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java:
##########
@@ -795,18 +792,18 @@ public synchronized void stop() throws IOException,
InterruptedException {
// is restarted, then the processes will start right away
// and not wait for the old locks to be cleaned up.
try {
- new ZooZap().zap(getServerContext().getSiteConfiguration(), "-manager",
- "-compaction-coordinators", "-tservers", "-compactors", "-sservers");
+ new ZooZap().zap(getServerContext().getZooSession(),
+ getServerContext().getSiteConfiguration(), "-manager",
"-compaction-coordinators",
+ "-tservers", "-compactors", "-sservers");
} catch (RuntimeException e) {
log.error("Error zapping zookeeper locks", e);
}
- control.stop(ServerType.ZOOKEEPER, null);
// Clear the location of the servers in ZooCache.
- // When ZooKeeper was stopped in the previous method call,
- // the local ZooKeeper watcher did not fire. If MAC is
- // restarted, then ZooKeeper will start on the same port with
- // the same data, but no Watchers will fire.
+ // If MAC is restarted, then ZooKeeper will start
+ // on the same port with the same data and the
+ // start of the server processes will wait until
+ // the pre-existing ephemeral locks time out.
boolean startCalled = true;
try {
getServerContext().getZooKeeperRoot();
Review Comment:
This whole block is sketchy. This seems to be relying on a side-effect of
ServerContext.getZooKeeperRoot() having already computed the root from the
instanceId... but you could just call getInstanceId() instead to verify that it
has been looked up.
However, in order to even make it this far, the
`getServerContext().getZooSession()` passed to the above ZooZap call will have
already had to do all this... at least... it will have had to in order to use
the context's ZooSession, which will have already been chrooted. That's why
ZooZap takes config only... so it can create its own non-chrooted ZooSession
that doesn't depend on whether the cluster has been initialized yet or not.
##########
server/base/src/main/java/org/apache/accumulo/server/ServerContext.java:
##########
@@ -140,8 +140,15 @@ private ServerContext(ServerInfo info) {
* Used during initialization to set the instance name and ID.
*/
public static ServerContext initialize(SiteConfiguration siteConfig, String
instanceName,
- InstanceId instanceID) {
- return new ServerContext(ServerInfo.initialize(siteConfig, instanceName,
instanceID));
+ InstanceId instanceId) {
+ return new ServerContext(ServerInfo.initialize(siteConfig, instanceName,
instanceId)) {
+
+ @Override
+ public InstanceId getInstanceID() {
+ return instanceId;
+ }
+
+ };
Review Comment:
A similar thing is already being done in the ServerInfo.initialize() today.
Initialize doesn't create a an unnecessary ZooSession to do this lookup, so
there's nothing to fix here. It's already handled by ServerInfo.
##########
server/base/src/main/java/org/apache/accumulo/server/ServerInfo.java:
##########
@@ -57,18 +58,17 @@ public class ServerInfo implements ClientInfo {
// set things up using the config file, the instanceId from HDFS, and ZK for
the instanceName
static ServerInfo fromServerConfig(SiteConfiguration siteConfig) {
final Function<ServerInfo,String> instanceNameFromZk = si -> {
+ InstanceId iid = VolumeManager.getInstanceIDFromHdfs(
+
si.getServerDirs().getInstanceIdLocation(si.getVolumeManager().getFirst()),
+ si.getHadoopConf());
Review Comment:
By merging this instanceId lookup into the function to look up the
instanceName, and removing the function to track the instanceId, that creates
an unnecessary need to look up the instanceId in ZooKeeper later.
One of the goals of the previous iteration I did in here was to make sure
that we don't do unnecessary lookups like that. The ServerInfo should already
have the instanceId from the volume manager lookup, so when ServerContext asks
ServerInfo for it, it should never need to go back to ZooKeeper, since it
already has it. Here, it goes to the volume manager, but then doesn't remember
it, so ZooKeeper must be consulted later.
##########
core/src/main/java/org/apache/accumulo/core/clientImpl/ClientContext.java:
##########
@@ -230,6 +231,9 @@ public ClientContext(SingletonReservation reservation,
ClientInfo info,
return zk;
});
+ this.instanceId =
+ memoize(() -> ZooUtil.getInstanceId(this.zooSession.get(),
getInstanceName()));
+
Review Comment:
This isn't compatible with the chroot effort. `this.zooSession` is going to
be chrooted, but the lookup of the mapping between instanceName/instanceId
lives outside the chroot hierarchy.
##########
server/base/src/main/java/org/apache/accumulo/server/util/ZooZap.java:
##########
@@ -94,13 +94,15 @@ public void execute(String[] args) throws Exception {
if (siteConf.getBoolean(Property.INSTANCE_RPC_SASL_ENABLED)) {
SecurityUtil.serverLogin(siteConf);
}
- zap(siteConf, args);
+ try (var zk = new ZooSession(getClass().getSimpleName(), siteConf)) {
+ zap(zk, siteConf, args);
+ }
} finally {
SingletonManager.setMode(Mode.CLOSED);
}
}
- public void zap(SiteConfiguration siteConf, String... args) {
+ public void zap(ZooSession zk, SiteConfiguration siteConf, String... args) {
Review Comment:
The two use cases should be:
1. ZooZap.execute(mainArgs), or
2. ZooZap.zap(serverContext, options)
The execute version should do the options parsing, and the other should just
be able to be called directly, with options suitable for code (like a Set of
services to zap, rather than option strings like `-manager`).
##########
minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java:
##########
@@ -795,18 +792,18 @@ public synchronized void stop() throws IOException,
InterruptedException {
// is restarted, then the processes will start right away
// and not wait for the old locks to be cleaned up.
try {
- new ZooZap().zap(getServerContext().getSiteConfiguration(), "-manager",
- "-compaction-coordinators", "-tservers", "-compactors", "-sservers");
+ new ZooZap().zap(getServerContext().getZooSession(),
+ getServerContext().getSiteConfiguration(), "-manager",
"-compaction-coordinators",
+ "-tservers", "-compactors", "-sservers");
} catch (RuntimeException e) {
log.error("Error zapping zookeeper locks", e);
}
- control.stop(ServerType.ZOOKEEPER, null);
// Clear the location of the servers in ZooCache.
- // When ZooKeeper was stopped in the previous method call,
- // the local ZooKeeper watcher did not fire. If MAC is
- // restarted, then ZooKeeper will start on the same port with
- // the same data, but no Watchers will fire.
+ // If MAC is restarted, then ZooKeeper will start
+ // on the same port with the same data and the
+ // start of the server processes will wait until
+ // the pre-existing ephemeral locks time out.
Review Comment:
I think the old version of this comment is wrong, so it should be changed.
However, I think these changes aren't quite correct either. It doesn't matter,
for example, that ZooKeeper is restarting on the same port with the same data,
only that the locks created by the Accumulo servers still exist until they time
out. That's why ZooZap is called. ZooCache is only cleared here, because it has
no idea what ZooZap did, so we want to make sure the current process is updated.
If my suggestion to make ZooZap take a context is applied here, then ZooZap
can clear ZooCache itself, and there's no need for much of the rest of this
code at all.
Also, the comment above the try block is sufficient to explain that the
locks need cleared after stopping the services. However, it has some typos in
it.
If ZooCache is cleared next, then the only thing this comment needs to say
is something like:
```suggestion
// Clear the zapped locations from ZooCache
```
##########
minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterControl.java:
##########
@@ -283,7 +283,8 @@ public synchronized void stop(ServerType server, String
hostname) throws IOExcep
try {
cluster.stopProcessWithTimeout(managerProcess, 30,
TimeUnit.SECONDS);
try {
- new
ZooZap().zap(cluster.getServerContext().getSiteConfiguration(), "-manager");
+ new ZooZap().zap(cluster.getServerContext().getZooSession(),
+ cluster.getServerContext().getSiteConfiguration(),
"-manager");
Review Comment:
I think it would be worth having a zap method that takes a context here, but
not a ZooSession. The context will have the ZooSession, and it will be chrooted
properly for ZooZap to work with, once the chroot stuff is done. It should not
take a ZooSession directly, because it won't know the relative paths without
knowing whether the passed in ZooSession is chrooted or not.
##########
minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java:
##########
@@ -642,79 +642,76 @@ private void verifyUp() throws InterruptedException,
IOException {
waitForProcessStart(tsp, "TabletServer" + tsExpectedCount);
}
- String secret = getSiteConfiguration().get(Property.INSTANCE_SECRET);
String instanceNamePath = Constants.ZROOT + Constants.ZINSTANCES + "/" +
getInstanceName();
- try (var zk = new ZooSession(MiniAccumuloClusterImpl.class.getSimpleName()
+ ".verifyUp()",
- getZooKeepers(), 60000, secret)) {
- var rdr = zk.asReader();
- InstanceId instanceId = null;
- for (int i = 0; i < numTries; i++) {
- try {
- // make sure it's up enough we can perform operations successfully
- rdr.sync("/");
- // wait for the instance to be created
- instanceId = InstanceId.of(new String(rdr.getData(instanceNamePath),
UTF_8));
- break;
- } catch (KeeperException e) {
- log.warn("Error trying to read instance id from zookeeper: {}",
e.getMessage());
- log.debug("Unable to read instance id from zookeeper.", e);
- }
- Thread.sleep(1000);
- }
-
- if (instanceId == null) {
- try {
- log.warn("******* COULD NOT FIND INSTANCE ID - DUMPING ZK
************");
- log.warn("Connected to ZooKeeper: {}", getZooKeepers());
- log.warn("Looking for instanceId at {}", instanceNamePath);
- ZKUtil.visitSubTreeDFS(zk, Constants.ZROOT, false,
- (rc, path, ctx, name) -> log.warn("{}", path));
- log.warn("******* END ZK DUMP ************");
- } catch (KeeperException e) {
- log.error("Error dumping zk", e);
- }
- throw new IllegalStateException("Unable to find instance id from
zookeeper.");
- }
-
- String rootPath = ZooUtil.getRoot(instanceId);
- int tsActualCount = 0;
+ ZooSession zk = getServerContext().getZooSession();
Review Comment:
This ZooSession instance must be separate because it is checking the
instanceName/instanceId hierarchy, outside the (to-be) chrooted hierarchy that
the context.getZooSession() will be. This is specific to 3.1, though. The main
branch has a complete rewrite of verifyUp, so it doesn't need to do this
directly. I actually think it'd be better to backport those changes to 3.1, if
possible, or else it should just leave this ZooSession alone... or it should be
rewritten to only check for things inside the chroot hierarchy, and not the
instanceName/instanceId mapping, in which case it could use the ZooSession from
the context.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]