sumitagrawl commented on code in PR #6360:
URL: https://github.com/apache/ozone/pull/6360#discussion_r1549914557


##########
hadoop-hdds/docs/content/interface/ReconApi.md:
##########
@@ -927,6 +927,39 @@ Returns all the datanodes in the cluster.
         ]
      }
 ```
+
+### PUT /api/v1/datanodes/remove
+
+**Parameters**
+
+* uuids (List of node uuids in JSON array format).
+
+```json
+[
+  "50ca4c95-2ef3-4430-b944-97d2442c3daf"
+]  
+```
+
+**Returns**
+
+Returns the list of datanodes which are removed successfully and list of 
datanodes which were not found.
+
+```json
+{
+  "removedNodes": {
+    "totalCount": 1,
+    "datanodes": [
+      {
+        "uuid": "50ca4c95-2ef3-4430-b944-97d2442c3daf",
+        "hostname": "ozone-datanode-4.ozone_default",
+        "state": "DEAD",
+        "pipelines": null
+      }
+    ],
+    "message": "Successfully removed 1 datanodes !!!"

Review Comment:
   IMO, we do not need return message, list size can tell success removed 
implicitly.



##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/NodeEndpoint.java:
##########
@@ -171,4 +177,59 @@ private DatanodeStorageReport 
getStorageReport(DatanodeDetails datanode) {
     long committed = nodeStat.getCommitted().get();
     return new DatanodeStorageReport(capacity, used, remaining, committed);
   }
+
+  @PUT
+  @Path("/remove")
+  @Consumes(MediaType.APPLICATION_JSON)
+  public Response removeDatanodes(List<String> uuids) {
+    List<DatanodeMetadata> notFoundDatanodes = new ArrayList<>();
+    List<DatanodeMetadata> removedDatanodes = new ArrayList<>();
+
+    Preconditions.checkNotNull(uuids, "Datanode list argument should not be 
null");
+    Preconditions.checkArgument(!uuids.isEmpty(), "Datanode list argument 
should not be empty");
+    try {
+      for (String uuid : uuids) {
+        DatanodeDetails nodeByUuid = nodeManager.getNodeByUuid(uuid);
+        try {
+          NodeStatus nodeStatus = nodeManager.getNodeStatus(nodeByUuid);
+          boolean isNodeDecommissioned = nodeByUuid.getPersistedOpState() == 
NodeOperationalState.DECOMMISSIONED;
+          boolean isNodeInMaintenance = nodeByUuid.getPersistedOpState() == 
NodeOperationalState.DECOMMISSIONED;
+          if (isNodeDecommissioned || isNodeInMaintenance || 
nodeStatus.isDead()) {
+            removedDatanodes.add(DatanodeMetadata.newBuilder()
+                .withHostname(nodeManager.getHostName(nodeByUuid))
+                .withUUid(uuid)
+                .withState(nodeManager.getNodeStatus(nodeByUuid).getHealth())
+                .build());
+            nodeManager.removeNode(nodeByUuid);
+          } else {
+            Response.ResponseBuilder builder = 
Response.status(Response.Status.BAD_REQUEST);
+            builder.entity("Invalid request: Node: " + uuid + " should be in 
either DECOMMISSIONED or " +
+                "IN_MAINTENANCE mode or DEAD.");
+            // Build and return the response
+            return builder.build();

Review Comment:
   Here, few of nodes are removed, but we are not returning those info. In this 
case, need return success nodes and failed nodes which is not matching for state
   Or
   First filter and return if any invalid nodes, and then return success nodes.



##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconNodeManager.java:
##########
@@ -321,4 +321,27 @@ public long getNodeDBKeyCount() throws IOException {
       return nodeCount;
     }
   }
+
+  /**
+   * Remove an existing node from the NodeDB. Explicit removal from admin user.
+   * First this API call removes the node info from NodeManager memory and
+   * if successful, then remove the node finally from NODES table as well.
+   *
+   * @param datanodeDetails Datanode details.
+   */
+  @Override
+  public void removeNode(DatanodeDetails datanodeDetails) throws 
NodeNotFoundException, IOException {
+    try {
+      nodeDB.delete(datanodeDetails.getUuid());
+    } catch (IOException ioException) {
+      LOG.error("Node {} deletion fails from Node DB.", 
datanodeDetails.getUuid());
+      // retry to delete the node from node db
+      nodeDB.delete(datanodeDetails.getUuid());

Review Comment:
   If failure, do we need retry again? any logic? IMO, we should return failure



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/states/NodeStateMap.java:
##########
@@ -456,4 +456,16 @@ private static Predicate<DatanodeInfo> 
matching(NodeOperationalState state) {
   private static Predicate<DatanodeInfo> matching(NodeState health) {
     return dn -> health.equals(dn.getNodeStatus().getHealth());
   }
+
+  public void removeNode(DatanodeDetails datanodeDetails) throws 
NodeNotFoundException {
+    lock.writeLock().lock();
+    try {
+      UUID uuid = datanodeDetails.getUuid();
+      getNodeInfo(uuid);

Review Comment:
   reason for this check? only checking if node exist or not? IMO, not 
requried, we can blindly remove



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java:
##########
@@ -1642,4 +1642,34 @@ private ReentrantReadWriteLock.WriteLock writeLock() {
   private ReentrantReadWriteLock.ReadLock readLock() {
     return lock.readLock();
   }
+
+  /**
+   * This API allows removal of only DECOMMISSIONED and DEAD nodes from 
NodeManager data structures and cleanup memory.
+   * This API call is having a pre-condition before removal of node like 
following resources to be removed:
+   *   --- pipelines
+   *   --- containers
+   *   --- network topology
+   *   --- or any other cache related to node context in SCM
+   *
+   * @param datanodeDetails
+   * @throws NodeNotFoundException
+   */
+  @Override
+  public void removeNode(DatanodeDetails datanodeDetails) throws 
NodeNotFoundException, IOException {
+    writeLock().lock();
+    try {
+      NodeStatus nodeStatus = this.getNodeStatus(datanodeDetails);
+      if (datanodeDetails.isDecommissioned() || nodeStatus.isDead()) {
+        if (clusterMap.contains(datanodeDetails)) {
+          clusterMap.remove(datanodeDetails);
+        }
+        nodeStateManager.removeNode(datanodeDetails);
+        removeFromDnsToUuidMap(datanodeDetails.getUuid(), 
datanodeDetails.getIpAddress());
+        commandQueue.removeCommand(datanodeDetails.getUuid());
+        scmNodeEventPublisher.fireEvent(SCMEvents.DEAD_NODE, datanodeDetails);

Review Comment:
   For SCM, deadnode handle should be fired first -- this can be precondition.
   For Recon, it sync details from SCM. So the state may change from SCM sync, 
do we really need this for this task?



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java:
##########
@@ -1642,4 +1642,34 @@ private ReentrantReadWriteLock.WriteLock writeLock() {
   private ReentrantReadWriteLock.ReadLock readLock() {
     return lock.readLock();
   }
+
+  /**
+   * This API allows removal of only DECOMMISSIONED and DEAD nodes from 
NodeManager data structures and cleanup memory.
+   * This API call is having a pre-condition before removal of node like 
following resources to be removed:
+   *   --- pipelines
+   *   --- containers
+   *   --- network topology
+   *   --- or any other cache related to node context in SCM
+   *
+   * @param datanodeDetails
+   * @throws NodeNotFoundException
+   */
+  @Override
+  public void removeNode(DatanodeDetails datanodeDetails) throws 
NodeNotFoundException, IOException {
+    writeLock().lock();
+    try {
+      NodeStatus nodeStatus = this.getNodeStatus(datanodeDetails);
+      if (datanodeDetails.isDecommissioned() || nodeStatus.isDead()) {

Review Comment:
   recon also check in-maintanance, please check if this check is correct?



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java:
##########
@@ -1642,4 +1642,34 @@ private ReentrantReadWriteLock.WriteLock writeLock() {
   private ReentrantReadWriteLock.ReadLock readLock() {
     return lock.readLock();
   }
+
+  /**
+   * This API allows removal of only DECOMMISSIONED and DEAD nodes from 
NodeManager data structures and cleanup memory.
+   * This API call is having a pre-condition before removal of node like 
following resources to be removed:
+   *   --- pipelines
+   *   --- containers
+   *   --- network topology
+   *   --- or any other cache related to node context in SCM
+   *
+   * @param datanodeDetails
+   * @throws NodeNotFoundException
+   */
+  @Override
+  public void removeNode(DatanodeDetails datanodeDetails) throws 
NodeNotFoundException, IOException {
+    writeLock().lock();
+    try {
+      NodeStatus nodeStatus = this.getNodeStatus(datanodeDetails);
+      if (datanodeDetails.isDecommissioned() || nodeStatus.isDead()) {
+        if (clusterMap.contains(datanodeDetails)) {
+          clusterMap.remove(datanodeDetails);
+        }
+        nodeStateManager.removeNode(datanodeDetails);
+        removeFromDnsToUuidMap(datanodeDetails.getUuid(), 
datanodeDetails.getIpAddress());
+        commandQueue.removeCommand(datanodeDetails.getUuid());
+        scmNodeEventPublisher.fireEvent(SCMEvents.DEAD_NODE, datanodeDetails);

Review Comment:
   firing DEAD_NODE will try to get info from node manager and then cleanup, 
and to be called before removeNode. So IMO, firing this event i snot requried,



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/CommandQueue.java:
##########
@@ -141,6 +141,17 @@ public void addCommand(final UUID datanodeUuid, final 
SCMCommand
     commandsInQueue++;
   }
 
+  /**
+   * Removes all Commands from the SCM Queue for the datanode.
+   *
+   * @param datanodeUuid DatanodeDetails.Uuid
+   */
+  public void removeCommand(final UUID datanodeUuid) {

Review Comment:
   we can use below method, which will get and remove, no need expose new 
method. Same is used in deadnode handler also
   org.apache.hadoop.hdds.scm.node.SCMNodeManager#getCommandQueue



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to