[jira] [Commented] (HDDS-199) Implement ReplicationManager to handle underreplication of closed containers

Elek, Marton (JIRA) Fri, 20 Jul 2018 05:39:13 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550734#comment-16550734
 ]


Elek, Marton commented on HDDS-199:
-----------------------------------

{quote}

Shall we rename {{SCMEvents.REPLICATION_COMPLETE}} to something like 
{{REPLICATION_STATUS}}. Also we might have to override default logic of 
EventWatcher as this replication status object might have 3 possible status 
(i.e PENDING, EXECUTED, FAILED). Default method in EventWatcher only handles 
EXECUTED.

{quote}

Good question [~ajayydv]. I would address it in a different jira as this patch 
became just bigger and bigger. 
 # Currently the EventWatcher is listening on one completion event not 
filtering status event.
 # It could be modified but in that case I would also rename the 
EventWatcher.completionEvent field and maybe modify the structure (this is the 
reason why I prefer to do it in a separated task, It's more like an 
EventWatcher adjustment)
 # An other option is to modify the SCMDatanodeHeartbeatDispatcher to send 
different internal events based on the status of the CommandStatusReport. One 
big advantage of this approach that the different type of results (failed 
closing, executed closing) will be visible on the EventQueue monitoring 
interface (eg. number of failed closing events instead of number of command 
status report)

 

> Implement ReplicationManager to handle underreplication of closed containers
> ----------------------------------------------------------------------------
>
>                 Key: HDDS-199
>                 URL: https://issues.apache.org/jira/browse/HDDS-199
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: SCM
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>             Fix For: 0.2.1
>
>         Attachments: HDDS-199.001.patch, HDDS-199.002.patch, 
> HDDS-199.003.patch, HDDS-199.004.patch, HDDS-199.005.patch, 
> HDDS-199.006.patch, HDDS-199.007.patch, HDDS-199.008.patch, 
> HDDS-199.009.patch, HDDS-199.010.patch, HDDS-199.011.patch, HDDS-199.012.patch
>
>
> HDDS/Ozone supports Open and Closed containers. In case of specific 
> conditions (container is full, node is failed) the container will be closed 
> and will be replicated in a different way. The replication of Open containers 
> are handled with Ratis and PipelineManger.
> The ReplicationManager should handle the replication of the ClosedContainers. 
> The replication information will be sent as an event 
> (UnderReplicated/OverReplicated). 
> The Replication manager will collect all of the events in a priority queue 
> (to replicate first the containers where more replica is missing) calculate 
> the destination datanode (first with a very simple algorithm, later with 
> calculating scatter-width) and send the Copy/Delete container to the datanode 
> (CommandQueue).
> A CopyCommandWatcher/DeleteCommandWatcher are also included to retry the 
> copy/delete in case of failure. This is an in-memory structure (based on 
> HDDS-195) which can requeue the underreplicated/overreplicated events to the 
> prioirity queue unless the confirmation of the copy/delete command is arrived.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-199) Implement ReplicationManager to handle underreplication of closed containers

Reply via email to