[jira] [Commented] (HDFS-12387) Ozone: Support Ratis as a first class replication mechanism

Xiaoyu Yao (JIRA) Thu, 07 Sep 2017 16:30:25 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-12387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157850#comment-16157850
 ]


Xiaoyu Yao commented on HDFS-12387:
-----------------------------------

Thanks for working on this. The patch looks good to me overall. Here are some 
comments.

ChunkGroupOutputStream.java
Line 293: comment does not apply and can be removed.
 
 
XceiverClientRatis.java
Line 116-123: can we add some document on this?
Line 122: can we add some error log  including both the datanode and the 
newPeers?
 
 
ContainerOperationClient.java
Line 98: can you add a TODO to update the Pipeline state from ALLOCATED to OPEN
 
Line 162: NIT: commentlog can be removed.
Line 163: NIT:  should we change to debug log or remove it?
 
Line 97/201:  how do we handle if the pipeline state is not in either ALLOCATED 
or OPEN state?  Can we add a precondition assert here to ensure the pipeline is 
in OPEN state?
 
Line 260/299: we should add {{if (LOG.isDebugEnabled(O)) }} .
 
 
BlockContainerInfo.java
 
Line 28: class document needs to be updated, the name of the class itself may 
also need to be udpated as the ContainerInfo is not only for Block svc only 
anymore with this change.
 
Line 64: NIT: "last used time"
Line 80/82: the exceptions are not declared on Line 86
 
 
Pipeline.java
 
Line 62: should we remove containerName from the Pipeline? This can be handled 
in a separate ticket. My understanding is that we will need a pipeline name for 
the container like we defined in Ozone.proto.
 
Line 65/251: lifeCycleStates -> lifeCycleState
 
 
BlockManagerImpl.java
Line 200: can we update the javadoc with the new parameters?
 
Line 252 The size/usage information is not being persisted when allocateBlock 
via BlockManager. Can we add an API on SCM to allow update the container usage 
info? For now, we can still let block manager to call this API to update usage 
before the container report is available. I found the TODO in ContainerMapping 
to update the size via container report. With that, we still could have over 
provision problem wrt. Container size. 
 
Line 233: +1 for the idea of allowing different container selection policies?
 
With the new container selection algorithm, how do we handle more than 20 
clients allocate block at the same time? We may not want to hand over the same 
ALLOCATED containers to different clients for them to create on the DNs? 
Otherwise, clients will need to compete for the container creation on DNs with 
some fails and retry? I see the patch change the default container provision 
size from 5 to 20. This can mitigate the above issue to some extent but cannot 
solve it completely. As we discussed offline, we might need an additional state 
to indicate that an ALLOCATED container has been assigned so that it will not 
be handed over to other clients. 
 
Line 305: can we add WARN or ERROR when the getMachines.size() == 0
 
Line 408-413: can re update the API that support get OPEN containers with type 
and replication factors as parameter?
This can be done with a follow up JIRA?
 
ContainerMapping.java
Line 251: blockreports-> container report
 
ContainerMapping#updateContainerState needs to be updated (missing in the 
change)
It should update the in memory map via 
containerStateManager#updateContainerState 
 
 
ContainerStateManager.java
Line 141: we will need to pass the containerStore from ContainerMapping so that 
the ContainerStateManager can load the containers from persisted store after 
reboot.
 
Line 161: NIT: can we change this OZONE_SCM_BLOCK_SIZE_KEY to 
OZONE_SCM_BLOCK_SIZE_MB
 
Line 193: ContainerKey contains owner,type, replicationFactor and state.  Will 
the look up of container by name with the same container key using priority 
queue be slow? Maybe a LinkedHashMap based LRU will have better performance for 
look up than PriorityQueue here.  This can also avoid the overhead of 
maintaining a new lastUsed time in the containerInfo.
 
Line 468: factor -> replicationFactor
 
 
PipelineSelector.java
Line 74-75: please add TODO STAND_ALONE should not be always set for Ratis. Or 
do we want to use this as a default and update later?
 
 
 
RatisManagerImpl.java
 
 
Line 53/67/194: ratisMembers never got updated. Can you add some comment on the 
expected usage of it? If I understand correctly, we should update ratisMembers 
after line 197 or when the client inform ratisManager that the pipeline has 
been created.
 
Line 110/153/163: do we need to bind pipeline with a containerName? Especially 
with line 110, what if client ask for a pipeline for container2 where we return 
an existing open pipeline for container1. 
 
Line 158: NIT: define const for pipeline prefix "Ratis-"


> Ozone: Support Ratis as a first class replication mechanism
> -----------------------------------------------------------
>
>                 Key: HDFS-12387
>                 URL: https://issues.apache.org/jira/browse/HDFS-12387
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>    Affects Versions: HDFS-7240
>            Reporter: Anu Engineer
>            Assignee: Anu Engineer
>            Priority: Critical
>              Labels: ozoneMerge
>         Attachments: HDFS-12387-HDFS-7240.001.patch
>
>
> Ozone container layer supports pluggable replication policies. This JIRA 
> brings Apache Ratis based replication to Ozone.  Apache Ratis is a java 
> implementation of Raft protocol.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-12387) Ozone: Support Ratis as a first class replication mechanism

Reply via email to