[jira] [Updated] (HDFS-12387) Ozone: Support Ratis as a first class replication mechanism

Anu Engineer (JIRA) Fri, 22 Sep 2017 14:00:24 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-12387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Anu Engineer updated HDFS-12387:
--------------------------------
    Attachment: HDFS-12387-HDFS-7240.002.patch

[~xyao] and [~nandakumar131] Thanks for the review comments. I have updated the 
patch to address most comments and to the latest tree.

bq. ContainerState Change
Had an offline chat with Nanda and moving this to another JIRA since the fix is 
large and complex.

bq. PipelineSelector#newPipelineFromNodesAt the start of the function I have 
added 
{code}
  Preconditions.checkNotNull(nodes);
  Preconditions.checkArgument(nodes.size() > 0);
 {code}
 Please let me know if you had anything else in mind.


 bq.ChunkGroupOutputStream.java:Line 293: comment does not apply
Fixed.

bq. XceiverClientRatis.java Document and add log
Fixed.



bq. ContainerOperationClient.java: Line 98: can you add a TODO to update the 
Pipeline state from ALLOCATED to OPEN
Fixed.

bq. Line 162: NIT: commentlog can be removed. 
Fixed.

bq. Line 163: NIT:  should we change to debug log or remove it?
Fixed.

Line 97/201:  how do we handle if the pipeline state is not in either ALLOCATED 
or OPEN state?  Can we add a precondition assert here to ensure the pipeline is 
in OPEN state?
Fixed.

 
Line 260/299: we should add {{if (LOG.isDebugEnabled(O)) }} .
Fixed.

 
 
BlockContainerInfo.java
 
bq. Line 28: class document needs to be updated, the name of the class itself 
may also need to be udpated as the ContainerInfo is not only for Block svc only 
anymore with this change.
Fixed the class documentation.
 
bq. Line 64: NIT: "last used time"
Fixed.
bq. Line 80/82: the exceptions are not declared on Line 86
Both of them are runTimeExceptions, so I am copying the class 
comments from the parent. I can remove them if you wish.

 
Pipeline.java
 
bq. Line 62: should we remove containerName from the Pipeline? This can be 
handled in a separate ticket. My understanding is that we will need a pipeline 
name for the container like we defined in Ozone.proto.
Eventually yes, I have left a comment to that effect in ozone.proto. If I 
remove now, the code
churn is too much, so would like to do it another JIRA.
 
bq. Line 65/251: lifeCycleStates -> lifeCycleState
Thanks, Fixed.

 
bq. BlockManagerImpl.java:Line 200: can we update the javadoc with the new 
parameters?
Thanks, Fixed.


bq. Line 252 The size/usage information is not being persisted when 
allocateBlock via BlockManager. Can we add an API on SCM to allow update the 
container usage info? For now, we can still let block manager to call this API 
to update usage before the container report is available. I found the TODO in 
ContainerMapping to update the size via container report. With that, we still 
could have over provision problem wrt. Container size. 
Let us tackle this after we have the container report. That will give us a 
better idea about what to do here.
 
 
bq. With the new container selection algorithm, how do we handle more than 20 
clients allocate block at the same time? We may not want to hand over the same 
ALLOCATED containers to different clients for them to create on the DNs? 

Great Question, we need to have a container lease manager, that will be added 
later. The code does not regress from the current state, since we always used 
to do this work around in the client path.


bq. Otherwise, clients will need to compete for the container creation on DNs 
with some fails and retry? I see the patch change the default container 
provision size from 5 to 20. This can mitigate the above issue to some extent 
but cannot solve it completely. As we discussed offline, we might need an 
additional state to indicate that an ALLOCATED container has been assigned so 
that it will not be handed over to other clients. 
Completely, Agree. I will fix this issue as a follow up patch. I will file a 
JIRA.


bq. Line 305: can we add WARN or ERROR when the getMachines.size() == 0
Fixed.
 
Line 408-413: can re update the API that support get OPEN containers with type 
and replication factors as parameter?
This can be done with a follow up JIRA?
I will file a JIRA for this.

 
ContainerMapping.java
Line 251: blockreports-> container report
 
bq. ContainerMapping#updateContainerState needs to be updated (missing in the 
change)
It should update the in memory map via 
containerStateManager#updateContainerState
Not sure I understand this comment clearly, Skipping for now.
 
 
bq. ContainerStateManager.java Line 141: we will need to pass the 
containerStore from ContainerMapping so that the ContainerStateManager can load 
the containers from persisted store after reboot.
Filed a JIRA to track this issue.

 
bq. Line 161: NIT: can we change this OZONE_SCM_BLOCK_SIZE_KEY to 
OZONE_SCM_BLOCK_SIZE_MB
Fixed.

bq. Line 193: ContainerKey contains owner,type, replicationFactor and state.  
Will the look up of container by name with the same container key using 
priority queue be slow? Maybe a LinkedHashMap based LRU will have better 
performance for look up than PriorityQueue here.  This can also avoid the 
overhead of maintaining a new lastUsed time in the containerInfo.
Filed a JIRA to track this issue.
 
b1. fixed
Line 468: factor -> replicationFactor

 
 
PipelineSelector.java
Line 74-75: please add TODO STAND_ALONE should not be always set for Ratis. Or 
do we want to use this as a default and update later?
Yes, we want to use this as Default. The idea is to bring in the code and fix 
the issues in 
nect few JIRAs.

 
 
 
RatisManagerImpl.java
 
 
bq. Line 53/67/194: ratisMembers never got updated. Can you add some comment on 
the expected usage of it? If I understand correctly, we should update 
ratisMembers after line 197 or when the client inform ratisManager that the 
pipeline has been created.
This is not complete, I will need some one like Tsz or Chen to look at this and 
tell me exactly how we should do this.

 
bq. Line 110/153/163: do we need to bind pipeline with a containerName? 
Especially with line 110, what if client ask for a pipeline for container2 
where we return an existing open pipeline for container1. 
This is not complete, since we need some calls from Ratis layer.

 
bq. Line 158: NIT: define const for pipeline prefix "Ratis-"
Fixed.

> Ozone: Support Ratis as a first class replication mechanism
> -----------------------------------------------------------
>
>                 Key: HDFS-12387
>                 URL: https://issues.apache.org/jira/browse/HDFS-12387
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>    Affects Versions: HDFS-7240
>            Reporter: Anu Engineer
>            Assignee: Anu Engineer
>            Priority: Critical
>              Labels: ozoneMerge
>         Attachments: HDFS-12387-HDFS-7240.001.patch, 
> HDFS-12387-HDFS-7240.002.patch
>
>
> Ozone container layer supports pluggable replication policies. This JIRA 
> brings Apache Ratis based replication to Ozone.  Apache Ratis is a java 
> implementation of Raft protocol.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-12387) Ozone: Support Ratis as a first class replication mechanism

Reply via email to