[
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251106#comment-14251106
]
Colin Patrick McCabe commented on HDFS-7443:
--------------------------------------------
It appears that the old software could sometimes create a duplicate copy of the
same block in two different {{subdir}} folders on the same volume. In all the
cases in which we've seen this, the block files were identical. Two files,
both for the same block id, in separate directories. This appears to be a bug,
since obviously we don't want to store the same block twice on the same volume.
This causes the {{EEXIST}} problem on upgrade, since the new block layout only
has one place where each block ID can go. Unfortunately, the hardlink code
doesn't print the name of the file which caused the problem, making diagnosis
more difficult than it should be.
One easy way around this is to check for duplicate block IDs on each volume
before upgrading, and manually remove the duplicates.
We should also consider logging an error message and continuing the upgrade
process when we encounter this.
[~kihwal], I'm not sure why, in your case, the DataNode retried the hard link
process multiple times. I'm also not sure why you ended up with a jumbled
{{previous.tmp}} directory. When we reproduced this on CDH5.2, we did not have
that problem, for whatever reason.
> Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails
> --------------------------------------------------------
>
> Key: HDFS-7443
> URL: https://issues.apache.org/jira/browse/HDFS-7443
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Kihwal Lee
> Priority: Blocker
>
> When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of
> datanodes were not coming up. They treid data file layout upgrade for
> BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.
> All failures were caused by {{NativeIO.link()}} throwing IOException saying
> {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon
> retried when the block pool initialization was retried whenever
> {{BPServiceActor}} was registering with the namenode. After many retries,
> datenodes terminated. This would leave {{previous.tmp}} and {{current}} with
> no {{VERSION}} file in the block pool slice storage directory.
> Although {{previous.tmp}} contained the old {{VERSION}} file, the content was
> in the new layout and the subdirs were all newly created ones. This
> shouldn't have happened because the upgrade-recovery logic in {{Storage}}
> removes {{current}} and renames {{previous.tmp}} to {{current}} before
> retrying. All successfully upgraded volumes had old state preserved in their
> {{previous}} directory.
> In summary there were two observed issues.
> - Upgrade failure with {{link()}} failing with {{EEXIST}}
> - {{previous.tmp}} contained not the content of original {{current}}, but
> half-upgraded one.
> We did not see this in smaller scale test clusters.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)