[ 
https://issues.apache.org/jira/browse/HDFS-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-4138:
--------------------------------------

    Attachment: hdfs-4138.patch

Kihwal, your analysis of the problem is absolutely correct. There is a race 
between startCommonServices(), which initializes metrics, and 
runCheckpointDaemon(), which initializes EditLog.
I also agree we should be able to move initialization of BackupImage along with 
its EditLog out of registerWith() into BN.loadNamesystem(), but this will 
require some rework of current code.
The simplest way is to modify the condition in 
getTransactionsSinceLastLogRoll() as you did in your patch, only we should 
avoid adding additional member in FSNamesystem. I did that in the patch 
attached.
It becomes a one-line change, only I couldn't help it and removed two redundant 
fields in BackupNode, which are not used and anyways replicated in Storage, and 
also fixed one warning.
I was able to start BN successfully with this patch.
                
> BackupNode startup fails due to uninitialized edit log
> ------------------------------------------------------
>
>                 Key: HDFS-4138
>                 URL: https://issues.apache.org/jira/browse/HDFS-4138
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, name-node
>    Affects Versions: 2.0.3-alpha
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>         Attachments: hdfs-4138.patch, hdfs-4138.patch
>
>
> It was notices by TestBackupNode.testCheckpointNode failure. When a backup 
> node is getting started, it tries to enter active state and start common 
> services. But when it fails to start services and exits, which is caught by 
> the exit util.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to