[ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816394#comment-13816394
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-2832:
----------------------------------------------

> With a billion nodes the probability of a collision in a 128-bit space is 
> less than 1 in 10^20. ...

Let n be the number of possible IDs.
Let m be the number of nodes.
The probability of no collision is P = n!/((n-m)! n^m).

Put n=2^128 and m=10^9, we have
* P ~= 0.99999999999999999999853063206294150856

The probability of collision is
* 1-P ~= 1.4693679370584914464 * 10^(-21) < 10^(-20).

However, randomly generated UUIDs only have 122 random bits accoring to 
[Wikipedia|http://en.wikipedia.org/wiki/UUID#Random_UUID_probability_of_duplicates].
Now put n=2^122 and m=10^9, we have
* P ~= 0.99999999999999999990596045202825654743

The probability of collision is
* 1-P ~= 9.403954797174345257 * 10^(-20) < 10^(-19)

Similar result can be obtained using approximation P ~= exp(-m^2/(2*n)).


> Enable support for heterogeneous storages in HDFS
> -------------------------------------------------
>
>                 Key: HDFS-2832
>                 URL: https://issues.apache.org/jira/browse/HDFS-2832
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>    Affects Versions: 0.24.0
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>         Attachments: 20130813-HeterogeneousStorage.pdf, H2832_20131107.patch, 
> h2832_20131023.patch, h2832_20131023b.patch, h2832_20131025.patch, 
> h2832_20131028.patch, h2832_20131028b.patch, h2832_20131029.patch, 
> h2832_20131103.patch, h2832_20131104.patch, h2832_20131105.patch
>
>
> HDFS currently supports configuration where storages are a list of 
> directories. Typically each of these directories correspond to a volume with 
> its own file system. All these directories are homogeneous and therefore 
> identified as a single storage at the namenode. I propose, change to the 
> current model where Datanode * is a * storage, to Datanode * is a collection 
> * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to