[ 
https://issues.apache.org/jira/browse/HDDS-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16536780#comment-16536780
 ] 

Elek, Marton commented on HDDS-199:
-----------------------------------

Thanks [~ajayydv] the additional comments.

1. I started to refactor it to us ExecutorService after your comment but it 
become more complex for me. ExecutorServices is good for handling multiple 
smaller tasks (executorService.submit), but in our case we have one 
long-running thread with only one task. I think it's more clear to use just a 
thread.

2. By default ReplicationManager receives events only for closed containers. 
But you are right, it's better to check it. I added a PreconditionCheck to 
check the state of the container state (as there is a try catch block inside 
the main loop, it will be printed out and the loop will continue).

3. SCMCommonPolicy unit tests: To be honest, I also considered to modify the 
unit test. The only problem is that there is no unit tests for the policies. 
There is a higher level test (TestContainerPlacement) which checks the 
distributions of the containers. But you are right, and your comment convinced 
me. I created two brand new unit tests for the two placement implementation 
which includes the check of the exclude list.

4. Other nits are fixed. Except the UUID: We can't use the UUID of the original 
replication request as there is a one-to-many relationship between the original 
replication event and the new tracking events: if multiple replicas are 
missing, we create multiple DatanodeCommand and we need to track them 
one-by-one. Therefore we need different UUIDs. But thanks to point to it: in 
that case we don't need the getUUID in the original  ReplicationRequest event 
as it could not been used.

Latest patch has been uploaded with all these fixess + new unit tests.

> Implement ReplicationManager to replicate ClosedContainers
> ----------------------------------------------------------
>
>                 Key: HDDS-199
>                 URL: https://issues.apache.org/jira/browse/HDDS-199
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: SCM
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>             Fix For: 0.2.1
>
>         Attachments: HDDS-199.001.patch, HDDS-199.002.patch, 
> HDDS-199.003.patch, HDDS-199.004.patch, HDDS-199.005.patch
>
>
> HDDS/Ozone supports Open and Closed containers. In case of specific 
> conditions (container is full, node is failed) the container will be closed 
> and will be replicated in a different way. The replication of Open containers 
> are handled with Ratis and PipelineManger.
> The ReplicationManager should handle the replication of the ClosedContainers. 
> The replication information will be sent as an event 
> (UnderReplicated/OverReplicated). 
> The Replication manager will collect all of the events in a priority queue 
> (to replicate first the containers where more replica is missing) calculate 
> the destination datanode (first with a very simple algorithm, later with 
> calculating scatter-width) and send the Copy/Delete container to the datanode 
> (CommandQueue).
> A CopyCommandWatcher/DeleteCommandWatcher are also included to retry the 
> copy/delete in case of failure. This is an in-memory structure (based on 
> HDDS-195) which can requeue the underreplicated/overreplicated events to the 
> prioirity queue unless the confirmation of the copy/delete command is arrived.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to