[
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252480#comment-14252480
]
Colin Patrick McCabe commented on HDFS-7443:
--------------------------------------------
When we observed this, they were not hard links, but separate copies. They
were identical (we ran a command-line checksum on them). If possible, I would
rather not start trying to pick the "best" one because I feel like 3x
replication should ensure that we have redundancy in the system, and because
the code would get a lot more complex. Because we do the hardlinks in
parallel, we would have to somehow accumulate the duplicates and deal with them
at the end, once all worker threads had been joined.
> Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are
> present in the same volume
> ------------------------------------------------------------------------------------------------------
>
> Key: HDFS-7443
> URL: https://issues.apache.org/jira/browse/HDFS-7443
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Kihwal Lee
> Assignee: Colin Patrick McCabe
> Priority: Blocker
> Attachments: HDFS-7443.001.patch
>
>
> When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of
> datanodes were not coming up. They treid data file layout upgrade for
> BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.
> All failures were caused by {{NativeIO.link()}} throwing IOException saying
> {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon
> retried when the block pool initialization was retried whenever
> {{BPServiceActor}} was registering with the namenode. After many retries,
> datenodes terminated. This would leave {{previous.tmp}} and {{current}} with
> no {{VERSION}} file in the block pool slice storage directory.
> Although {{previous.tmp}} contained the old {{VERSION}} file, the content was
> in the new layout and the subdirs were all newly created ones. This
> shouldn't have happened because the upgrade-recovery logic in {{Storage}}
> removes {{current}} and renames {{previous.tmp}} to {{current}} before
> retrying. All successfully upgraded volumes had old state preserved in their
> {{previous}} directory.
> In summary there were two observed issues.
> - Upgrade failure with {{link()}} failing with {{EEXIST}}
> - {{previous.tmp}} contained not the content of original {{current}}, but
> half-upgraded one.
> We did not see this in smaller scale test clusters.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)