[ 
https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569063#comment-13569063
 ] 

Chris Nauroth commented on HDFS-4462:
-------------------------------------

Hi, Aaron.  The code looks good.  I applied the patch to branch-2 and ran 
multiple test suites related to checkpoints and 2NN.

{code}
-  boolean isSameCluster(FSImage si) {
-    return namespaceID == si.getStorage().namespaceID &&
-      clusterID.equals(si.getClusterID()) &&
-      blockpoolID.equals(si.getBlockPoolID());
+  boolean namespaceIdMatches(FSImage si) {
+    return namespaceID == si.getStorage().namespaceID;
   }
{code}

Considering that namespace ID is an integer, whereas cluster ID is based on a 
GUID, it seems there is higher likelihood of accidental collision.  Then, 
{{CheckpointSignature#validateStorageInfo}} could misidentify a match.  It's 
still highly unlikely (but non-zero).

I'm wondering if a safer change would be (pseudo-code):

{code}
if namespace ID + cluster ID + blockpool ID are defined on both
  compare all 3 fields
else if only namespace ID is defined on one of them
  compare only namespace ID
{code}

This would keep the logic the same for upgrades between 2 post-federation 
versions, and just change the logic for the case of pre-fed -> post-fed.

Or am I being too paranoid?  :-)

                
> 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation 
> version of HDFS
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-4462
>                 URL: https://issues.apache.org/jira/browse/HDFS-4462
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: HDFS-4462.patch, HDFS-4462.patch
>
>
> The 2NN currently has logic to detect when its on-disk FS metadata needs an 
> upgrade with respect to the NN's metadata (i.e. the layout versions are 
> different) and in this case it will proceed with the checkpoint despite 
> storage signatures not matching precisely if the BP ID and Cluster ID do 
> match exactly. However, in situations where we're upgrading from versions of 
> HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints 
> will always fail with an error like the following:
> {noformat}
> 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent 
> checkpoint fields.
> LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = 
> CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = 
> BP-1520616013-172.21.3.106-1359680537136.
> Expecting respectively: -19; 403832480; 0; ; .
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to