[ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939011#comment-16939011
 ] 

Konstantin Shvachko commented on HDFS-14305:
--------------------------------------------

Glad we agree. Yes, I regret I bumped into this issue too late.

Another problem, that this change does not prevent from collisions during 
regular restarts (after upgrading). If you add a new NameNode in the beginning 
of the list in the config it will change {{nnIndex}} and therefore the 
respective node ranges.
I guess my point is that there is no "safe" way here, that is, I don't know 
which way is less "risky" as you put it. One way or another you need to know 
the ranges and follow a certain order of restarting NNs, which avoids 
collisions. And all these are not documented or mentioned in the release notes.

So my proposal is to revert this change, and fix the arithmetic bug in previous 
implementation. We can then think of a more robust solution, which avoids 
generating ranges based on NameNode ordering.

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-14305
>                 URL: https://issues.apache.org/jira/browse/HDFS-14305
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, security
>            Reporter: Chao Sun
>            Assignee: Xiaoqiao He
>            Priority: Major
>             Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
>         Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch, 
> HDFS-14305.003.patch, HDFS-14305.004.patch, HDFS-14305.005.patch, 
> HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
>     this.intRange = Integer.MAX_VALUE / numNNs;
>     this.nnRangeStart = intRange * nnIndex;
>     this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.<nameservice>}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to