[ 
https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829890#action_12829890
 ] 

Todd Lipcon commented on HDFS-898:
----------------------------------

I get slightly different figures than you guys... I am looking at this as 
identical to the well-known Birthday Problem: 
http://en.wikipedia.org/wiki/Birthday_problem

In our case, we have 2^(64-b) "days" and 2^26 "people"

We have 2^(64-b) "days" and B=2^26 "people". Following the formula on Wikipedia:

{noformat}
In [21]: n = 2^26
In [22]: d = 2^(64-8)
In [23]: reduce(operator.mul, [(1 - float(i)/d)  for i in xrange(0, n)])
Out[23]: 0.0037908372356959502
{noformat}

whereas you've calculated 0.03065 for this case.

The python above agrees with Wikipedia for the birthday example, so I think the 
code is correct:

{noformat}
In [25]: d = 365
In [26]: n = 23
In [27]: reduce(operator.mul, [(1 - float(i)/d)  for i in xrange(0, n)])
Out[27]: 0.49270276567601451
{noformat}

Wary of floating point math, I also checked using int math to calculate 
numerator and denominator, then int division to make them smaller, then float 
division to get a fraction:
{noformat}
In [70]: num,denom = (reduce(operator.mul, [d - i for i in xrange(0, n)])), 
(d**(n))
In [71]: float(num/100000000000000000000)/float(denom/100000000000000000000)
Out[71]: 0.0037908372356959502
{noformat}

So where are our numbers diverging?

> Sequential generation of block ids
> ----------------------------------
>
>                 Key: HDFS-898
>                 URL: https://issues.apache.org/jira/browse/HDFS-898
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.20.1
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.22.0
>
>         Attachments: DuplicateBlockIds.patch, HighBitProjection.pdf
>
>
> This is a proposal to replace random generation of block ids with a 
> sequential generator in order to avoid block id reuse in the future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to