errose28 commented on code in PR #4505:
URL: https://github.com/apache/ozone/pull/4505#discussion_r1162238008
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java:
##########
@@ -1746,6 +1766,13 @@ public ContainerManager getContainerManager() {
return containerManager;
}
+ /**
+ * Returns SCM lease manager.
+ */
+ public LeaseManager getLeaseManager() {
+ return leaseManager;
+ }
Review Comment:
It might be cleaner to inject the LeaseManager using `SCMConfigurator`
rather than exposing a getter for testing.
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/CloseContainerEventHandler.java:
##########
@@ -121,6 +135,15 @@ public void onMessage(ContainerID containerID,
EventPublisher publisher) {
}
}
+ private Void triggerCloseCallback(
Review Comment:
Some comments on this method and the place in this class where the lease
manager is used would be good to explain why we do not close the container
right away.
##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconStorageContainerManagerFacade.java:
##########
@@ -246,9 +241,15 @@ public
ReconStorageContainerManagerFacade(OzoneConfiguration conf,
IncrementalContainerReportHandler icrHandler =
new ReconIncrementalContainerReportHandler(nodeManager,
containerManager, scmContext);
+ long timeDuration = conf.getTimeDuration(
+ OzoneConfigKeys.OZONE_SCM_CLOSE_CONTAINER_WAIT_DURATION,
+ OzoneConfigKeys.OZONE_SCM_CLOSE_CONTAINER_WAIT_DURATION_DEFAULT
+ .getDuration(), TimeUnit.MILLISECONDS);
+ leaseManager = new LeaseManager<>("Lease Manager", timeDuration);
CloseContainerEventHandler closeContainerHandler =
new CloseContainerEventHandler(
- pipelineManager, containerManager, scmContext);
+ pipelineManager, containerManager, scmContext,
+ leaseManager, timeDuration);
Review Comment:
Since Recon can't send close commands to DNs (`ReconNodeManager#onMessage`)
this thread will be a no-op. Can we make a Recon specific version that does not
run this thread? For example, Recon has its own handler for dead nodes:
`ReconDeadNodeHandler`.
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/CloseContainerEventHandler.java:
##########
@@ -104,14 +113,19 @@ public void onMessage(ContainerID containerID,
EventPublisher publisher) {
command.setTerm(scmContext.getTermOfLeader());
command.setEncodedToken(getContainerToken(containerID));
- getNodes(container).forEach(node ->
- publisher.fireEvent(DATANODE_COMMAND,
- new CommandForDatanode<>(node.getUuid(), command)));
+ try {
+ leaseManager.acquire(command, timeout, () -> triggerCloseCallback(
+ publisher, container, command));
Review Comment:
Initially I thought we could do
```
leaseManager.acquire(command, timeout).registerCallBack(() ->
triggerCloseCallback(
publisher, container, command));
```
but this has a subtle race since `acquire` may have already notified
listeners by the time the callback is added. We don't have to change it here,
but in a follow up PR it might be good to change the lease manager so this
isn't possible, maybe by not returning the `Lease` object from `acquire`. We
could also remove dead code in `EventWatcher` that is doing this currently.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]