[jira] [Commented] (HDDS-2010) PipelineID management for multi-raft, in SCM or in datanode?

Xiaoyu Yao (Jira) Wed, 28 Aug 2019 20:06:09 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918239#comment-16918239
 ]


Xiaoyu Yao commented on HDDS-2010:
----------------------------------

I would prefer 1 for better scalability. Also SCM always has its in-memory 
pipeline map built based on the pipeline report from DNs.

 

We also need another Jira that change the current pipeline creation logic:

Currently, SCM directly talk to DN to create pipeline, assuming there are 
pending Read/Write need to use the pipeline follows. 

We should change to have the pipeline creation/destroy into DN heartbeat 
response model. This way we have better SCM scalability. 

 

cc: [~anu].

> PipelineID management for multi-raft, in SCM or in datanode?
> ------------------------------------------------------------
>
>                 Key: HDDS-2010
>                 URL: https://issues.apache.org/jira/browse/HDDS-2010
>             Project: Hadoop Distributed Data Store
>          Issue Type: New Feature
>          Components: Ozone Datanode
>            Reporter: Li Cheng
>            Assignee: Li Cheng
>            Priority: Major
>             Fix For: 0.5.0
>
>
> With the intention to support multi-raft, I wanna bring up a question on how 
> the pipeline unique ids be managed. Since every datanode  can be member in 
> multiple raft pipelines, the pipeline ids need to be persisted with the 
> datanode for recovery purpose (we can talk about recovery later). Generally 
> there are two options:
>  # Store in datanode (like datanodeDetails) and every time pipelines mapping 
> change on single datanode, pipeline ids will be serialized to local file. 
> This way will lead to many more local serialization of things like 
> datanodeDetails, but the updates are only for local datanode change. 
> Improvement can be made like linking a serializable object to datanodeDetails 
> and datanode keeps updating the new pipeline ids to the serializable object 
> instead the details file. On the other hand, since the pipeline ids are 
> stored only in datanode locally, there will be no global view in SCM. (or we 
> can store a lazy copy?)
>  # Stored in SCM. SCM can maintain a large mapping between datanode ids and 
> pipeline ids. But this way will lead to an exponentially increasing frequency 
> in SCM updates since the pipeline mapping changes are way more complex and 
> happen all the time. Obviously this gives SCM too much pressure, but it can 
> also give SCM a global view on the management over datanodes and multi raft 
> pipelines. 
>  
> Thoughts? [~xyao] [~Sammi] 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-2010) PipelineID management for multi-raft, in SCM or in datanode?

Reply via email to