[jira] [Comment Edited] (HDDS-199) Implement ReplicationManager to replicate ClosedContainers

Ajay Kumar (JIRA) Mon, 02 Jul 2018 16:48:54 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530587#comment-16530587
 ]


Ajay Kumar edited comment on HDDS-199 at 7/2/18 11:47 PM:
----------------------------------------------------------

[~elek] thanks for working in this. Few suggestions:
 * Move ReplicateContainerCommand, ReplicateCommandWatcher, ReplicationManager 
from {{org.apache.hadoop.hdds.scm.container}} to 
{{org.apache.hadoop.ozone.container.replication}}
 * Rename suggestion: {{ReplicateCommandWatcher}} to 
{{ReplicationCommandWatcher}} 
 * ReplicateContainerCommand:
 ** L65-67: Probably move this to some Pb-util class. We might have to do this 
conversion in other places as well.
 ** L75-L79: Use of stream might be less efficient than traditional approach 
specially since list size is pretty small. 
 * SCMCommonPolicy
 ** Since we are not doing anything with excluded nodes for time being we 
should add a TODO comment and may be add an Jira to handle it later.

 * ReplicationQueue: 
 ** L65: Update documentation for take as it will not return null anymore.
 ** L37,L45,L55,L65,L69: We should synchronize peek/remove and add operation. 
Currently our ReplicationManager seems to be single threaded but that may 
change.
 * ReplicationRequest: 
 ** L65: 
 * ReplicationManager:
 ** L165: With HDDS-175 we will not get pipeline from containerInfo. 
 ** L75: rename suggestion; containerStateMap to containerStateMgr. To avoid 
any confusion for ContainerStateManager and ContainerStateMap
 ** L220: getUUID returns null
* ScmConfigKeys: add default value for {{HDDS_SCM_WATCHER_TIMEOUT}} (i.e 
HDDS_SCM_WATCHER_TIMEOUT_DEFAULT)
  


was (Author: ajayydv):
[~elek] thanks for working in this.
 * Move ReplicateContainerCommand, ReplicateCommandWatcher, ReplicationManager 
from {{org.apache.hadoop.hdds.scm.container}} to 
{{org.apache.hadoop.ozone.container.replication}}
 * Rename suggestion: {{ReplicateCommandWatcher}} to 
{{ReplicationCommandWatcher}} 
 * ReplicateContainerCommand:
 ** L65-67: Probably move this to some Pb-util class. We might have to do this 
conversion in other places as well.
 ** L75-L79: Use of stream might be less efficient than traditional approach 
specially since list size is pretty small. 
 * SCMCommonPolicy
 ** Since we are not doing anything with excluded nodes for time being we 
should add a TODO comment and may be add an Jira to handle it later.

 * ReplicationQueue: 
 ** L65: Update documentation for take as it will not return null anymore.
 ** L37,L45,L55,L65,L69: We should synchronize peek/remove and add operation. 
Currently our ReplicationManager seems to be single threaded but that may 
change.
 * ReplicationRequest: 
 ** L65: 
 * ReplicationManager:
 ** L165: With HDDS-175 we will not get pipeline from containerInfo. 
 ** L75: rename suggestion; containerStateMap to containerStateMgr. To avoid 
any confusion for ContainerStateManager and ContainerStateMap
 ** L220: getUUID returns null
* ScmConfigKeys: add default value for {{HDDS_SCM_WATCHER_TIMEOUT}} (i.e 
HDDS_SCM_WATCHER_TIMEOUT_DEFAULT)
  

> Implement ReplicationManager to replicate ClosedContainers
> ----------------------------------------------------------
>
>                 Key: HDDS-199
>                 URL: https://issues.apache.org/jira/browse/HDDS-199
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: SCM
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>             Fix For: 0.2.1
>
>         Attachments: HDDS-199.001.patch, HDDS-199.002.patch
>
>
> HDDS/Ozone supports Open and Closed containers. In case of specific 
> conditions (container is full, node is failed) the container will be closed 
> and will be replicated in a different way. The replication of Open containers 
> are handled with Ratis and PipelineManger.
> The ReplicationManager should handle the replication of the ClosedContainers. 
> The replication information will be sent as an event 
> (UnderReplicated/OverReplicated). 
> The Replication manager will collect all of the events in a priority queue 
> (to replicate first the containers where more replica is missing) calculate 
> the destination datanode (first with a very simple algorithm, later with 
> calculating scatter-width) and send the Copy/Delete container to the datanode 
> (CommandQueue).
> A CopyCommandWatcher/DeleteCommandWatcher are also included to retry the 
> copy/delete in case of failure. This is an in-memory structure (based on 
> HDDS-195) which can requeue the underreplicated/overreplicated events to the 
> prioirity queue unless the confirmation of the copy/delete command is arrived.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDDS-199) Implement ReplicationManager to replicate ClosedContainers

Reply via email to