[ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807844#comment-16807844
 ] 

Erik Krogen commented on HDFS-14378:
------------------------------------

Hey [~starphin], I haven't taken a detailed look at the patch. I don't have 
much historical context here so before doing so I'd like to get some opinions 
from older members of the project regarding the overall idea. Having the active 
do more of these things makes sense to me -- I've never really understood why 
the SbNN is the one rolling the edit logs -- but there may be some good 
reasoning that we are missing. [~shv], [~kihwal], [~ajisakaa] -- do any of you 
have opinions on this?

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-14378
>                 URL: https://issues.apache.org/jira/browse/HDFS-14378
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: star
>            Assignee: star
>            Priority: Minor
>         Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to