[
https://issues.apache.org/jira/browse/HDFS-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Shvachko updated HDFS-4138:
--------------------------------------
Attachment: hdfs-4138.patch
Kihwal, your analysis of the problem is absolutely correct. There is a race
between startCommonServices(), which initializes metrics, and
runCheckpointDaemon(), which initializes EditLog.
I also agree we should be able to move initialization of BackupImage along with
its EditLog out of registerWith() into BN.loadNamesystem(), but this will
require some rework of current code.
The simplest way is to modify the condition in
getTransactionsSinceLastLogRoll() as you did in your patch, only we should
avoid adding additional member in FSNamesystem. I did that in the patch
attached.
It becomes a one-line change, only I couldn't help it and removed two redundant
fields in BackupNode, which are not used and anyways replicated in Storage, and
also fixed one warning.
I was able to start BN successfully with this patch.
> BackupNode startup fails due to uninitialized edit log
> ------------------------------------------------------
>
> Key: HDFS-4138
> URL: https://issues.apache.org/jira/browse/HDFS-4138
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha, name-node
> Affects Versions: 2.0.3-alpha
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Attachments: hdfs-4138.patch, hdfs-4138.patch
>
>
> It was notices by TestBackupNode.testCheckpointNode failure. When a backup
> node is getting started, it tries to enter active state and start common
> services. But when it fails to start services and exits, which is caught by
> the exit util.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira