[
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277795#comment-14277795
]
Tsz Wo Nicholas Sze commented on HDFS-7575:
-------------------------------------------
{quote}
> BTW, UUID.randomUUID isn't guaranteed to return a unique id. It's highly
> improbable, but possible, although more likely due to older storages, user
> copying a storage, etc. Although the storage ids are unique after the
> "upgrade", if a disk is moved from one node to another, then a collision is
> possible. Hence another reason why I feel explicitly checking for collisions
> at startup should always be done.
UUIDs are designed to be globally unique with a high probability when generated
by trusted processes. Even when the volume of generated UUIDs is very high,
which is certainly not the case for storage IDs. The probability of a storageID
collision in normal operation is vanishingly small.
https://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates
{quote}
We usually compare the probability of collision with hardware failure
probability, or using the famous cosmic ray argument
(http://stackoverflow.com/questions/2580933/cosmic-rays-what-is-the-probability-they-will-affect-a-program),
since we can never do better than that.
{quote}
... Up until HDFS-4645, HDFS used randomly generated block IDs drawn from a far
smaller space-- 2^64 – and we never had a problem. ...
{quote}
We did have collision check for random block IDs.
> NameNode not handling heartbeats properly after HDFS-2832
> ---------------------------------------------------------
>
> Key: HDFS-7575
> URL: https://issues.apache.org/jira/browse/HDFS-7575
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.4.0, 2.5.0, 2.6.0
> Reporter: Lars Francke
> Assignee: Arpit Agarwal
> Priority: Critical
> Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch,
> HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch,
> HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch,
> testUpgrade22via24GeneratesStorageIDs.tgz,
> testUpgradeFrom22GeneratesStorageIDs.tgz,
> testUpgradeFrom24PreservesStorageId.tgz
>
>
> Before HDFS-2832 each DataNode would have a unique storageId which included
> its IP address. Since HDFS-2832 the DataNodes have a unique storageId per
> storage directory which is just a random UUID.
> They send reports per storage directory in their heartbeats. This heartbeat
> is processed on the NameNode in the
> {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would
> just store the information per Datanode. After the patch though each DataNode
> can have multiple different storages so it's stored in a map keyed by the
> storage Id.
> This works fine for all clusters that have been installed post HDFS-2832 as
> they get a UUID for their storage Id. So a DN with 8 drives has a map with 8
> different keys. On each Heartbeat the Map is searched and updated
> ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
> {code:title=DatanodeStorageInfo}
> void updateState(StorageReport r) {
> capacity = r.getCapacity();
> dfsUsed = r.getDfsUsed();
> remaining = r.getRemaining();
> blockPoolUsed = r.getBlockPoolUsed();
> }
> {code}
> On clusters that were upgraded from a pre HDFS-2832 version though the
> storage Id has not been rewritten (at least not on the four clusters I
> checked) so each directory will have the exact same storageId. That means
> there'll be only a single entry in the {{storageMap}} and it'll be
> overwritten by a random {{StorageReport}} from the DataNode. This can be seen
> in the {{updateState}} method above. This just assigns the capacity from the
> received report, instead it should probably sum it up per received heartbeat.
> The Balancer seems to be one of the only things that actually uses this
> information so it now considers the utilization of a random drive per
> DataNode for balancing purposes.
> Things get even worse when a drive has been added or replaced as this will
> now get a new storage Id so there'll be two entries in the storageMap. As new
> drives are usually empty it skewes the balancers decision in a way that this
> node will never be considered over-utilized.
> Another problem is that old StorageReports are never removed from the
> storageMap. So if I replace a drive and it gets a new storage Id the old one
> will still be in place and used for all calculations by the Balancer until a
> restart of the NameNode.
> I can try providing a patch that does the following:
> * Instead of using a Map I could just store the array we receive or instead
> of storing an array sum up the values for reports with the same Id
> * On each heartbeat clear the map (so we know we have up to date information)
> Does that sound sensible?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)