[
https://issues.apache.org/jira/browse/HDDS-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777360#comment-16777360
]
Arpit Agarwal commented on HDDS-935:
------------------------------------
Hi [~shashikant], just took a look at this patch. A few thoughts:
# Nitpick: Let's use try with resources for fileInputStream in loadSnapshot
# same with takeSnapshot - we should use try with resources for the output
stream.
# Dispatcher#init - Init methods are usually to be avoided. Sometimes it may
not be possible. In this case it looks like we are calling init twice on the
HddsDispatcher at startup. Any way we can improve this?
# Looks like createContainerSet is not additive. i.e. it only tracks
containers since the last restart. Should this include containers created
previously via the Ratis snapshot?
# Possibly dumb question: Why do we add the container set to
DispatcherContext? We just need to update this set once the container is
successfully created right?
# We can add a bit more detail to this log message e.g. the container has been
lost and cannot be recreated on this DataNode.
{code}
if (getMissingContainerSet().contains(containerID)) {
StorageContainerException sce = new StorageContainerException(
"ContainerID " + containerID + " is missing",
{code}
> Avoid creating an already created container on a datanode in case of disk
> removal followed by datanode restart
> --------------------------------------------------------------------------------------------------------------
>
> Key: HDDS-935
> URL: https://issues.apache.org/jira/browse/HDDS-935
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Components: Ozone Datanode
> Affects Versions: 0.4.0
> Reporter: Rakesh R
> Assignee: Shashikant Banerjee
> Priority: Major
> Attachments: HDDS-935.000.patch, HDDS-935.001.patch,
> HDDS-935.002.patch, HDDS-935.003.patch, HDDS-935.004.patch, HDDS-935.005.patch
>
>
> Currently, a container gets created when a writeChunk request comes to
> HddsDispatcher and if the container does not exist already. In case a disk on
> which a container exists gets removed and datanode restarts and now, if a
> writeChunkRequest comes , it might end up creating the same container again
> with an updated BCSID as it won't detect the disk is removed. This won't be
> detected by SCM as well as it will have the latest BCSID. This Jira aims to
> address this issue.
> The proposed fix would be to persist the all the containerIds existing in the
> containerSet when a ratis snapshot is taken in the snapshot file. If the disk
> is removed and dn gets restarted, the container set will be rebuild after
> scanning all the available disks and the the container list stored in the
> snapshot file will give all the containers created in the datanode. The diff
> between these two will give the exact list of containers which were created
> but were not detected after the restart. Any writeChunk request now should
> validate the container Id from the list of missing containers. Also, we need
> to ensure container creation does not happen as part of applyTransaction of
> writeChunk request in Ratis.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]