[
https://issues.apache.org/jira/browse/HDDS-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530587#comment-16530587
]
Ajay Kumar edited comment on HDDS-199 at 7/2/18 11:47 PM:
----------------------------------------------------------
[~elek] thanks for working in this. Few suggestions:
* Move ReplicateContainerCommand, ReplicateCommandWatcher, ReplicationManager
from {{org.apache.hadoop.hdds.scm.container}} to
{{org.apache.hadoop.ozone.container.replication}}
* Rename suggestion: {{ReplicateCommandWatcher}} to
{{ReplicationCommandWatcher}}
* ReplicateContainerCommand:
** L65-67: Probably move this to some Pb-util class. We might have to do this
conversion in other places as well.
** L75-L79: Use of stream might be less efficient than traditional approach
specially since list size is pretty small.
* SCMCommonPolicy
** Since we are not doing anything with excluded nodes for time being we
should add a TODO comment and may be add an Jira to handle it later.
* ReplicationQueue:
** L65: Update documentation for take as it will not return null anymore.
** L37,L45,L55,L65,L69: We should synchronize peek/remove and add operation.
Currently our ReplicationManager seems to be single threaded but that may
change.
* ReplicationRequest:
** L65:
* ReplicationManager:
** L165: With HDDS-175 we will not get pipeline from containerInfo.
** L75: rename suggestion; containerStateMap to containerStateMgr. To avoid
any confusion for ContainerStateManager and ContainerStateMap
** L220: getUUID returns null
* ScmConfigKeys: add default value for {{HDDS_SCM_WATCHER_TIMEOUT}} (i.e
HDDS_SCM_WATCHER_TIMEOUT_DEFAULT)
was (Author: ajayydv):
[~elek] thanks for working in this.
* Move ReplicateContainerCommand, ReplicateCommandWatcher, ReplicationManager
from {{org.apache.hadoop.hdds.scm.container}} to
{{org.apache.hadoop.ozone.container.replication}}
* Rename suggestion: {{ReplicateCommandWatcher}} to
{{ReplicationCommandWatcher}}
* ReplicateContainerCommand:
** L65-67: Probably move this to some Pb-util class. We might have to do this
conversion in other places as well.
** L75-L79: Use of stream might be less efficient than traditional approach
specially since list size is pretty small.
* SCMCommonPolicy
** Since we are not doing anything with excluded nodes for time being we
should add a TODO comment and may be add an Jira to handle it later.
* ReplicationQueue:
** L65: Update documentation for take as it will not return null anymore.
** L37,L45,L55,L65,L69: We should synchronize peek/remove and add operation.
Currently our ReplicationManager seems to be single threaded but that may
change.
* ReplicationRequest:
** L65:
* ReplicationManager:
** L165: With HDDS-175 we will not get pipeline from containerInfo.
** L75: rename suggestion; containerStateMap to containerStateMgr. To avoid
any confusion for ContainerStateManager and ContainerStateMap
** L220: getUUID returns null
* ScmConfigKeys: add default value for {{HDDS_SCM_WATCHER_TIMEOUT}} (i.e
HDDS_SCM_WATCHER_TIMEOUT_DEFAULT)
> Implement ReplicationManager to replicate ClosedContainers
> ----------------------------------------------------------
>
> Key: HDDS-199
> URL: https://issues.apache.org/jira/browse/HDDS-199
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Components: SCM
> Reporter: Elek, Marton
> Assignee: Elek, Marton
> Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-199.001.patch, HDDS-199.002.patch
>
>
> HDDS/Ozone supports Open and Closed containers. In case of specific
> conditions (container is full, node is failed) the container will be closed
> and will be replicated in a different way. The replication of Open containers
> are handled with Ratis and PipelineManger.
> The ReplicationManager should handle the replication of the ClosedContainers.
> The replication information will be sent as an event
> (UnderReplicated/OverReplicated).
> The Replication manager will collect all of the events in a priority queue
> (to replicate first the containers where more replica is missing) calculate
> the destination datanode (first with a very simple algorithm, later with
> calculating scatter-width) and send the Copy/Delete container to the datanode
> (CommandQueue).
> A CopyCommandWatcher/DeleteCommandWatcher are also included to retry the
> copy/delete in case of failure. This is an in-memory structure (based on
> HDDS-195) which can requeue the underreplicated/overreplicated events to the
> prioirity queue unless the confirmation of the copy/delete command is arrived.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]