[ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951378#comment-16951378
 ] 

Konstantin Shvachko commented on HDFS-14305:
--------------------------------------------

??This patch was committed over my valid technical objection. I hope you will 
respect that??
Totally respect technical objections. I was under the impression you agreed 
with my reasoning. But I see I was wrong. Addressing your questions.

??the mitigation for the incompatibility.??
I don't think incompatible changes could be "mitigated". They are not "better 
or worse", they are unacceptable. For minor versions it is documented, but I 
would extend it to major versions as well, since this is the reason people now 
cannot upgrade to 3.x.

To this issue. There are different "cases" of overlapping ranges here.
# Restarting the same NameNodes on the same binaries and configuration can lead 
to overlapping ranges. This is the problem that was originally reported here. 
The idea was to choose an initial serial number randomly within the range 
designated to current NameNode. But due to an incorrect formula if the random 
number is negative the initial serial number falls outside the designated range 
and therefore causes intersection with ranges designated to other NameNodes. My 
patch v08 fixes just that.
# Changing the number of NameNodes on the cluster can cause ranges overlapping. 
This is not solved in current version. There is a work around mentioned above, 
but I agree with [~arp] it should be properly solved. It was _partly_ solved by 
the reverted approach v06 patch, but sacrificed compatibility.
# Rolling upgrade from version that does not contain this change to the one 
that does. No problem for v08, but a problem for v06. 
# Changing the order of NameNode in the configuration. Not solved by any of the 
approaches.

I think we should prevent all these cases of overlapping ranges. In a 
compatible way in the next jira. [~arp] would you agree?

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-14305
>                 URL: https://issues.apache.org/jira/browse/HDFS-14305
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, security
>            Reporter: Chao Sun
>            Assignee: Konstantin Shvachko
>            Priority: Major
>              Labels: multi-sbnn, release-blocker
>         Attachments: HDFS-14305-007.patch, HDFS-14305-008.patch, 
> HDFS-14305.001.patch, HDFS-14305.002.patch, HDFS-14305.003.patch, 
> HDFS-14305.004.patch, HDFS-14305.005.patch, HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
>     this.intRange = Integer.MAX_VALUE / numNNs;
>     this.nnRangeStart = intRange * nnIndex;
>     this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.<nameservice>}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to