[
https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569063#comment-13569063
]
Chris Nauroth commented on HDFS-4462:
-------------------------------------
Hi, Aaron. The code looks good. I applied the patch to branch-2 and ran
multiple test suites related to checkpoints and 2NN.
{code}
- boolean isSameCluster(FSImage si) {
- return namespaceID == si.getStorage().namespaceID &&
- clusterID.equals(si.getClusterID()) &&
- blockpoolID.equals(si.getBlockPoolID());
+ boolean namespaceIdMatches(FSImage si) {
+ return namespaceID == si.getStorage().namespaceID;
}
{code}
Considering that namespace ID is an integer, whereas cluster ID is based on a
GUID, it seems there is higher likelihood of accidental collision. Then,
{{CheckpointSignature#validateStorageInfo}} could misidentify a match. It's
still highly unlikely (but non-zero).
I'm wondering if a safer change would be (pseudo-code):
{code}
if namespace ID + cluster ID + blockpool ID are defined on both
compare all 3 fields
else if only namespace ID is defined on one of them
compare only namespace ID
{code}
This would keep the logic the same for upgrades between 2 post-federation
versions, and just change the logic for the case of pre-fed -> post-fed.
Or am I being too paranoid? :-)
> 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation
> version of HDFS
> ---------------------------------------------------------------------------------------
>
> Key: HDFS-4462
> URL: https://issues.apache.org/jira/browse/HDFS-4462
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.0.2-alpha
> Reporter: Aaron T. Myers
> Assignee: Aaron T. Myers
> Attachments: HDFS-4462.patch, HDFS-4462.patch
>
>
> The 2NN currently has logic to detect when its on-disk FS metadata needs an
> upgrade with respect to the NN's metadata (i.e. the layout versions are
> different) and in this case it will proceed with the checkpoint despite
> storage signatures not matching precisely if the BP ID and Cluster ID do
> match exactly. However, in situations where we're upgrading from versions of
> HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints
> will always fail with an error like the following:
> {noformat}
> 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent
> checkpoint fields.
> LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId =
> CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId =
> BP-1520616013-172.21.3.106-1359680537136.
> Expecting respectively: -19; 403832480; 0; ; .
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira