[
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815376#comment-13815376
]
Konstantin Shvachko commented on HDFS-2832:
-------------------------------------------
Arpit, I think we just agreed that collisions among UUIDs are possible but have
low probability.
This is a concern for me. Even though unlikely, a collision if it happens
creates a serious problem for the system integrity.
Does it concern you?
In my previous comment I tried to explain that in distributed case the
randomness of it is the main problem. Forget for a moment about PRNGs. Assume
that UUID is an incremental counter (such as generation stamp (and now block
id)), which is incremented by each node independently but at start up each
chooses a randomly number to start from. On a single node ++ can go on without
collisions for a long enough time to guarantee I will never see it. Y4K bug is
fine with me.
But if you take the second node and randomly choose a starting number it could
be close to (1000 apart) the starting point of the first node. Then the second
node can only generate 1000 storageIDs before colliding with those generated by
the other node.
The same is with PRNG you just replace ++ with next(). Long period doesn't
matter if you choose your starting points randomly.
> Enable support for heterogeneous storages in HDFS
> -------------------------------------------------
>
> Key: HDFS-2832
> URL: https://issues.apache.org/jira/browse/HDFS-2832
> Project: Hadoop HDFS
> Issue Type: New Feature
> Affects Versions: 0.24.0
> Reporter: Suresh Srinivas
> Assignee: Suresh Srinivas
> Attachments: 20130813-HeterogeneousStorage.pdf, h2832_20131023.patch,
> h2832_20131023b.patch, h2832_20131025.patch, h2832_20131028.patch,
> h2832_20131028b.patch, h2832_20131029.patch, h2832_20131103.patch,
> h2832_20131104.patch, h2832_20131105.patch
>
>
> HDFS currently supports configuration where storages are a list of
> directories. Typically each of these directories correspond to a volume with
> its own file system. All these directories are homogeneous and therefore
> identified as a single storage at the namenode. I propose, change to the
> current model where Datanode * is a * storage, to Datanode * is a collection
> * of strorages.
--
This message was sent by Atlassian JIRA
(v6.1#6144)