[ https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16686506#comment-16686506 ]
Shashikant Banerjee edited comment on HDDS-801 at 11/14/18 2:24 PM: -------------------------------------------------------------------- Thanks [~nandakumar131] for working on this. In addition to Mukul's comments ,some more comments : 1.KeyValueHandler.java : 865 -> update the comment to be container getting "quasi closed" rather than getting closed. 2.KeyValueHandler.java : 865 -> closeContainer is exposed to clients in ContainerProtocolCalls.Java. With SCMCLi as well, the close container can be invoked where a client can directly close (closeContainer in ContainerOperationClient). In such cases, a container in may be in just open state and hence the exception will be thrown: {code:java} // The container has to be in CLOSING state. if (state != State.CLOSING) { ContainerProtos.Result error = state == State.INVALID ? INVALID_CONTAINER_STATE : CONTAINER_INTERNAL_ERROR; throw new StorageContainerException("Cannot close container #" + container.getContainerData().getContainerID() + " while in " + state + " state.", error); }{code} Should we disallow/remove the closeContainer call exposed to clients/SCMCLI? 3. Any state change in ContainerState should triggerICR.In that case, closeContainer/quasiCloseContainer call should call updateContainerState internally to send ICR instead of executing individually. 4. There can be a case where let's say the SCM gets network separated from a follower before sending a closeCommand but Ratis ring is opeartional. In such case, leader will execute the closeContainer transaction successfully and follower will try to replicate the same but it will fail as the container was never put into closing state in follower before as it was not communicating with SCM. The assumption that container will be in closing state before closeContainer is called may not be necessarily true always. 5. KeyValueContainer.java : 310 -> The comments look misleading here. The first comment specifies the compaction should be done asynchronously as otherwise it will be lot slower . The next comment says it is ok if the operation is slow. Can you please check? {code:java} @Override public void close() throws StorageContainerException { //TODO: writing .container file and compaction can be done // asynchronously, otherwise rpc call for this will take a lot of time to // complete this action ContainerDataProto.State oldState = null; try { writeLock(); oldState = containerData.getState(); containerData.closeContainer(); File containerFile = getContainerFile(); // update the new container data to .container File updateContainerFile(containerFile); } catch (StorageContainerException ex) { if (oldState != null) { // Failed to update .container file. Reset the state to CLOSING containerData.setState(oldState); } throw ex; } finally { writeUnlock(); } // It is ok if this operation takes a bit of time. // Close container is not expected to be instantaneous. compactDB(); } {code} was (Author: shashikant): Thanks [~nandakumar131] for working on this. In addition to Mukul's comments ,some more comments : 1.KeyValueHandler.java : 865 -> update the comment to be container getting "quasi closed" rather than getting closed. 2.KeyValueHandler.java : 865 -> closeContainer is exposed to clients in ContainerProtocolCalls.Java. With SCMCLi as well, the close container can be invoked where a client can directly close (closeContainer in ContainerOperationClient). In such cases, a container in may be in just open state and hence the exception will be thrown: {code:java} // The container has to be in CLOSING state. if (state != State.CLOSING) { ContainerProtos.Result error = state == State.INVALID ? INVALID_CONTAINER_STATE : CONTAINER_INTERNAL_ERROR; throw new StorageContainerException("Cannot close container #" + container.getContainerData().getContainerID() + " while in " + state + " state.", error); }{code} Should we disallow/remove the closeContainer call exposed to clients/SCMCLI? 3. Any state change in ContainerState should triggerICR.In that case, closeContainer/quasiCloseContainer call should call updateContainerState internally to send ICR instead of executing individually. 4. There can be a case where let's say the SCM gets network separated from a follower before sending a closeCommand but Ratis ring is opeartional. In such case, leader will execute the closeContainer transaction successfully and follower will try to replicate the same but it will fail as the container was never put into closing state in follower before as it was not communicating with SCM. The assumption that container will be in closing state before closeContainer is called may not be necessarily true always. 5. KeyValueContainer.java : 310 -> The comments look misleading here. The first comment specifies we compaction should be done asynchronously as otherwise it will be lot slower . The next comment says it is ok if thhe opeartion is slow. Can you please check? {code:java} @Override public void close() throws StorageContainerException { //TODO: writing .container file and compaction can be done // asynchronously, otherwise rpc call for this will take a lot of time to // complete this action ContainerDataProto.State oldState = null; try { writeLock(); oldState = containerData.getState(); containerData.closeContainer(); File containerFile = getContainerFile(); // update the new container data to .container File updateContainerFile(containerFile); } catch (StorageContainerException ex) { if (oldState != null) { // Failed to update .container file. Reset the state to CLOSING containerData.setState(oldState); } throw ex; } finally { writeUnlock(); } // It is ok if this operation takes a bit of time. // Close container is not expected to be instantaneous. compactDB(); } {code} > Quasi close the container when close is not executed via Ratis > -------------------------------------------------------------- > > Key: HDDS-801 > URL: https://issues.apache.org/jira/browse/HDDS-801 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode > Affects Versions: 0.3.0 > Reporter: Nanda kumar > Assignee: Nanda kumar > Priority: Major > Attachments: HDDS-801.000.patch, HDDS-801.001.patch, > HDDS-801.002.patch > > > When datanode received CloseContainerCommand and the replication type is not > RATIS, we should QUASI close the container. After quasi-closing the container > an ICR has to be sent to SCM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org