[ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925622#comment-16925622
 ] 

star edited comment on HDFS-14378 at 9/9/19 11:51 AM:
------------------------------------------------------

Thanks [~jojochuang] for reviewing and advise.

Section 7.6 of HDFS-1073 emphasize the problem of multi nn. HDFS-6440 didn't 
take care of edits rolling and avoid multiple fsimage uploading by 'primary 
check pointer' status for multi SNN. 

I'd like to make two sub jiras as respect to edits rolling and fsimage 
downloading. ANN will roll its edit logs. As to fsimage, two options as far as 
my concern:
 # SNN do its own checkpointk and ANN will download fsimage from a random 
selected SNN.
 # 2. ANN issues a checkpoint command to SNNs by a special edit log like 
"OP_ROLLING_UPGRADE_START", then ANN downloads fsimage form a random selected 
SNN.

[~jojochuang], [~tlipcon]  what's your opinion?


was (Author: starphin):
Thanks [~jojochuang] for reviewing and advise.

Section 7.6 of HDFS-1073 emphasize the problem of multi nn. HDFS-6440 didn't 
take care of edits rolling and avoid multiple fsimage uploading by 'primary 
check pointer' status for multi SNN. 

I'd like to make two sub jiras as respect to edits rolling and fsimage 
downloading. ANN will roll its edit logs. As to fsimage, two options as far as 
my concern: 1. SNN do its own checkpointk and ANN will download fsimage from a 
random selected SNN. 2. ANN issues a checkpoint command to SNNs by a special 
edit log like "OP_ROLLING_UPGRADE_START", then ANN downloads fsimage form a 
random selected SNN.

[~jojochuang], [~tlipcon]  what's your opinion?

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-14378
>                 URL: https://issues.apache.org/jira/browse/HDFS-14378
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: ha, namenode
>    Affects Versions: 3.1.2
>            Reporter: star
>            Assignee: star
>            Priority: Major
>         Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, 
> HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch, 
> HDFS-14378-trunk.005.patch, HDFS-14378-trunk.006.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to