[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939011#comment-16939011 ]
Konstantin Shvachko commented on HDFS-14305: -------------------------------------------- Glad we agree. Yes, I regret I bumped into this issue too late. Another problem, that this change does not prevent from collisions during regular restarts (after upgrading). If you add a new NameNode in the beginning of the list in the config it will change {{nnIndex}} and therefore the respective node ranges. I guess my point is that there is no "safe" way here, that is, I don't know which way is less "risky" as you put it. One way or another you need to know the ranges and follow a certain order of restarting NNs, which avoids collisions. And all these are not documented or mentioned in the release notes. So my proposal is to revert this change, and fix the arithmetic bug in previous implementation. We can then think of a more robust solution, which avoids generating ranges based on NameNode ordering. > Serial number in BlockTokenSecretManager could overlap between different > namenodes > ---------------------------------------------------------------------------------- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, security > Reporter: Chao Sun > Assignee: Xiaoqiao He > Priority: Major > Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch, > HDFS-14305.003.patch, HDFS-14305.004.patch, HDFS-14305.005.patch, > HDFS-14305.006.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.<nameservice>}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org