[ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815376#comment-13815376
 ] 

Konstantin Shvachko commented on HDFS-2832:
-------------------------------------------

Arpit, I think we just agreed that collisions among UUIDs are possible but have 
low probability. 
This is a concern for me. Even though unlikely, a collision if it happens 
creates a serious problem for the system integrity. 
Does it concern you?

In my previous comment I tried to explain that in distributed case the 
randomness of it is the main problem. Forget for a moment about PRNGs. Assume 
that UUID is an incremental counter (such as generation stamp (and now block 
id)), which is incremented by each node independently but at start up each 
chooses a randomly number to start from. On a single node ++ can go on without 
collisions for a long enough time to guarantee I will never see it. Y4K bug is 
fine with me.
But if you take the second node and randomly choose a starting number it could 
be close to (1000 apart) the starting point of the first node. Then the second 
node can only generate 1000 storageIDs before colliding with those generated by 
the other node.
The same is with PRNG you just replace ++ with next(). Long period doesn't 
matter if you choose your starting points randomly.

> Enable support for heterogeneous storages in HDFS
> -------------------------------------------------
>
>                 Key: HDFS-2832
>                 URL: https://issues.apache.org/jira/browse/HDFS-2832
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>    Affects Versions: 0.24.0
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>         Attachments: 20130813-HeterogeneousStorage.pdf, h2832_20131023.patch, 
> h2832_20131023b.patch, h2832_20131025.patch, h2832_20131028.patch, 
> h2832_20131028b.patch, h2832_20131029.patch, h2832_20131103.patch, 
> h2832_20131104.patch, h2832_20131105.patch
>
>
> HDFS currently supports configuration where storages are a list of 
> directories. Typically each of these directories correspond to a volume with 
> its own file system. All these directories are homogeneous and therefore 
> identified as a single storage at the namenode. I propose, change to the 
> current model where Datanode * is a * storage, to Datanode * is a collection 
> * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to