[ 
https://issues.apache.org/jira/browse/HDDS-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535207#comment-16535207
 ] 

Ajay Kumar commented on HDDS-199:
---------------------------------

[~elek] thanks for updating the patch. On a second look at ReplicationManager i 
thought of having a ExecutorPool inside it whose size is configuration driven. 
(Instead of it being a runnable thread). Its default size may be 1 but it will 
give us flexibility to dial it up if required. Not sure if this is an overkill 
as single thread might be sufficient to handle all replica related work even in 
busy big cluster. Any thought on this?

Few more nits:
 * ReplicationManager
 ** L81: pipelineSelector can be removed.
 ** L200 ReplicationRequestToRepeat constructor takes UUID as parameter, can't 
we use ReplicationRequest uuid. (i.e we can remove extra paramter and field and 
have a API to return ReplicationRequest#getUUID)
 ** javadoc for class ReplicationRequestToRepeat
{quote}That's a very hard question. IMHO there is no easy way to get the 
current datanodes after HDDS-175, as there is no container -> datanode[] 
mapping for the closed containers. Do you know where this information available 
after HDDS-175? (I rebased the patch but can't return with{quote}
[HDDS-228] should give us the means to find out the replicas of given 
container. We might have to check that we are not adding any replication 
request for RATIS, open containers.
{quote} fixed only the SCMContainerPlacementRandom.java and not the 
SCMCommonPolicy.java. Instead of todo, now it should be handled.{quote}
Shall we add a test case to validate excluded nodes are not returned?


> Implement ReplicationManager to replicate ClosedContainers
> ----------------------------------------------------------
>
>                 Key: HDDS-199
>                 URL: https://issues.apache.org/jira/browse/HDDS-199
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: SCM
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>             Fix For: 0.2.1
>
>         Attachments: HDDS-199.001.patch, HDDS-199.002.patch, 
> HDDS-199.003.patch, HDDS-199.004.patch
>
>
> HDDS/Ozone supports Open and Closed containers. In case of specific 
> conditions (container is full, node is failed) the container will be closed 
> and will be replicated in a different way. The replication of Open containers 
> are handled with Ratis and PipelineManger.
> The ReplicationManager should handle the replication of the ClosedContainers. 
> The replication information will be sent as an event 
> (UnderReplicated/OverReplicated). 
> The Replication manager will collect all of the events in a priority queue 
> (to replicate first the containers where more replica is missing) calculate 
> the destination datanode (first with a very simple algorithm, later with 
> calculating scatter-width) and send the Copy/Delete container to the datanode 
> (CommandQueue).
> A CopyCommandWatcher/DeleteCommandWatcher are also included to retry the 
> copy/delete in case of failure. This is an in-memory structure (based on 
> HDDS-195) which can requeue the underreplicated/overreplicated events to the 
> prioirity queue unless the confirmation of the copy/delete command is arrived.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to