mbrodmann commented on a change in pull request #85: URL: https://github.com/apache/incubator-crail/pull/85#discussion_r578428717
########## File path: namenode/src/main/java/org/apache/crail/namenode/BlockStore.java ########## @@ -171,9 +192,33 @@ short addDataNode(DataNodeBlocks dataNode) { return RpcErrors.ERR_OK; } - + + short prepareForRemovalDatanode(DataNodeInfo dn) throws Exception { + // this will only mark it for removal + return prepareOrRemoveDN(dn, true); + } + + short removeDatanode(DataNodeInfo dn) throws Exception { + // this will remove it as well + return prepareOrRemoveDN(dn, false); + } + //--------------- - + + private short prepareOrRemoveDN(DataNodeInfo dn, boolean onlyMark) throws Exception { + DataNodeBlocks toBeRemoved = membership.get(dn.key()); + if (toBeRemoved == null) { + LOG.error("DataNode: " + dn.toString() + " not found"); + return RpcErrors.ERR_DATANODE_NOT_REGISTERED; + } else { + if (onlyMark) + toBeRemoved.scheduleForRemoval(); + else + membership.remove(toBeRemoved.key()); Review comment: I'm not sure I fully understand the problem you are referring to. Could you rephrase perhaps? My understanding is that when a datanode is forcefully removed (also independently of this PR, e.g. by killing it) we run into the problem you mentioned that certain blocks of files might be missing. When a client requests a file the namenode will point to the corresponding datanodes. As the datanode was forcefully shutdown the client application will not be able to retrieve the missing blocks. Am I missing a point here? To prevent this situation the idea was to only allow the (actual) removal of a datnode when it does not store any remaining blocks. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org