[
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Clampffer updated HDFS-9103:
----------------------------------
Attachment: HDFS-9103.HDFS-8707.007.patch
New patch, there's a bit of extra noise due to clang-format hitting a few files
that hadn't had it before.
Addressing Haohui's batch of concerns in order:
-That name_match function isn't needed after switching bad_datanodes_ to a map
-Got rid of BadDataNodeTracker::GetNodesToExclude and added a IsBadNode method
instead. The InputStream takes a shared_ptr to the BadDataNodeTracker and
calls IsBadNode directly, this should get rid of any need for caching as it
gets rid of a lot of copies and other work making sets of strings.
-Got rid of BadDataNodeTracker::Clear entirely and changed the tests so that
BadDataNodeTracker is scoped by test function. This avoids issues with
possibly carrying state between tests.
-Added a datanode exclusion duration to the Option class with a default of 10
minutes. Switched time units to milliseconds to be consistent. Is there a
standard name for this? I didn't see anything in the options used for
hdfs-sites.xml.
-Switched from system_clock to steady_clock to make sure time is always
monotonically increasing.
-I think the way I rearranged the code that this comment referred to simplified
it. If it's not please let me know what exactly needs to be simplified.
-Made ShouldExclude a static method of InputStream, got rid of the duplicate
used by the gmock test.
> Retry reads on DN failure
> -------------------------
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Reporter: Bob Hansen
> Assignee: James Clampffer
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch,
> HDFS-9103.HDFS-8707.006.patch, HDFS-9103.HDFS-8707.007.patch,
> HDFS-9103.HDFS-8707.3.patch, HDFS-9103.HDFS-8707.4.patch,
> HDFS-9103.HDFS-8707.5.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and
> try again.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)