ddanielr commented on code in PR #5321:
URL: https://github.com/apache/accumulo/pull/5321#discussion_r1962292859
##########
assemble/bin/accumulo-cluster:
##########
@@ -471,6 +477,13 @@ function control_services() {
fi
fi
done
+ if [[ $operation == "stop" || $operation == "kill" ]]; then
+ # If the prior commands were executed via ssh, then we need to wait
for them
+ # to complete before zapping the nodes in ZooKeeper
+ ssh_wait
+ echo "Cleaning tablet server entries from zookeeper for resource group
$group"
+ debugOrRun "$accumulo_cmd" org.apache.accumulo.server.util.ZooZap
-verbose -tservers -group "$group"
Review Comment:
If `./accumulo-cluster stop --tservers=group1 --local` is run then this
ZooZap command will remove locks for both local and remote tservers.
This behavior seems like a common failure point where an admin will attempt
to only stop "local" services and cause an entire cluster shutdown.
That could be fixed by adding logic to check for the `--local` arg and only
removing entries if its a cluster-wide action.
Alternatively, ZooZap could be modified to support passing a `-host` filter
similar to the `-group` option and replacing the `AddressSelector.all()` use in
ZooZap.
I'm guessing it couldn't be an exact match as cluster.yaml has the hostname
but not the port information.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]