[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829890#action_12829890 ]
Todd Lipcon commented on HDFS-898: ---------------------------------- I get slightly different figures than you guys... I am looking at this as identical to the well-known Birthday Problem: http://en.wikipedia.org/wiki/Birthday_problem In our case, we have 2^(64-b) "days" and 2^26 "people" We have 2^(64-b) "days" and B=2^26 "people". Following the formula on Wikipedia: {noformat} In [21]: n = 2^26 In [22]: d = 2^(64-8) In [23]: reduce(operator.mul, [(1 - float(i)/d) for i in xrange(0, n)]) Out[23]: 0.0037908372356959502 {noformat} whereas you've calculated 0.03065 for this case. The python above agrees with Wikipedia for the birthday example, so I think the code is correct: {noformat} In [25]: d = 365 In [26]: n = 23 In [27]: reduce(operator.mul, [(1 - float(i)/d) for i in xrange(0, n)]) Out[27]: 0.49270276567601451 {noformat} Wary of floating point math, I also checked using int math to calculate numerator and denominator, then int division to make them smaller, then float division to get a fraction: {noformat} In [70]: num,denom = (reduce(operator.mul, [d - i for i in xrange(0, n)])), (d**(n)) In [71]: float(num/100000000000000000000)/float(denom/100000000000000000000) Out[71]: 0.0037908372356959502 {noformat} So where are our numbers diverging? > Sequential generation of block ids > ---------------------------------- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Affects Versions: 0.20.1 > Reporter: Konstantin Shvachko > Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > Attachments: DuplicateBlockIds.patch, HighBitProjection.pdf > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.