[
https://issues.apache.org/jira/browse/HDDS-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784143#comment-16784143
]
Shashikant Banerjee commented on HDDS-935:
------------------------------------------
Thanks [~jnp] and [~arpitagarwal], for the review. Patch v7 fixes related test
failures , findbug and checkstyle issues
> Avoid creating an already created container on a datanode in case of disk
> removal followed by datanode restart
> --------------------------------------------------------------------------------------------------------------
>
> Key: HDDS-935
> URL: https://issues.apache.org/jira/browse/HDDS-935
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Components: Ozone Datanode
> Affects Versions: 0.4.0
> Reporter: Rakesh R
> Assignee: Shashikant Banerjee
> Priority: Major
> Attachments: HDDS-935.000.patch, HDDS-935.001.patch,
> HDDS-935.002.patch, HDDS-935.003.patch, HDDS-935.004.patch,
> HDDS-935.005.patch, HDDS-935.006.patch, HDDS-935.007.patch
>
>
> Currently, a container gets created when a writeChunk request comes to
> HddsDispatcher and if the container does not exist already. In case a disk on
> which a container exists gets removed and datanode restarts and now, if a
> writeChunkRequest comes , it might end up creating the same container again
> with an updated BCSID as it won't detect the disk is removed. This won't be
> detected by SCM as well as it will have the latest BCSID. This Jira aims to
> address this issue.
> The proposed fix would be to persist the all the containerIds existing in the
> containerSet when a ratis snapshot is taken in the snapshot file. If the disk
> is removed and dn gets restarted, the container set will be rebuild after
> scanning all the available disks and the the container list stored in the
> snapshot file will give all the containers created in the datanode. The diff
> between these two will give the exact list of containers which were created
> but were not detected after the restart. Any writeChunk request now should
> validate the container Id from the list of missing containers. Also, we need
> to ensure container creation does not happen as part of applyTransaction of
> writeChunk request in Ratis.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]