[
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988349#comment-14988349
]
James Clampffer commented on HDFS-9103:
---------------------------------------
Thanks for the feedback Haohui!
"The name_match function can be simplified. It looks like the comments are
unnecessary."
Simplified with a ternary expression or similar? I think I can get rid of that
whole block of code if I change Entry to something that makes sense to have in
a set. I disagree about removing comments; they aren't really making things
cluttered and make it easier understand the code quickly.
"This function is on the critical path thus the result should be cached"
I considered this; I wanted to profile it with a handful of dead datanodes
first before adding any complexity. I'll add caching now (ideally to something
that doesn't need to grab the lock).
"Looks like this is a test-only function. Let's name it "TEST_Clear()" for
clarity."
Right now it's test only. I figured eventually it wouldn't hurt to expose that
in the C++ API but for now I'll rename to make things clear.
"The timeout settings needs to be passed in through the Option object.
timeout_duration_ can be marked as const."
I totally missed this. I was waiting on Bob's configuration parser and forgot
that this was already available in the Options object.
"The clock needs to be a steady_clock for monotonicity"
Nice catch, didn't think about that.
"Are there any reasons of using a set<Entry> instead of a vector / map? Using
vector has much better cache locality results in this case."
I was considering swapping out the std::pair for a struct that defined
operator== and operator< based on node id so that the set semantics would make
sense. I should have done that for this patch, I'll add that to the next. The
set should be sitting in cache anyway if it's reasonably sized and accessed
recently but if it turns cache misses are leading to pipeline stalls I'll
certainly swap it out for a vector.
"The above code can be simplified."
Will do.
"The function can be placed in the InputStream class."
Good idea. That gets rid of that block of code that's duplicated in the gmock
test as well.
I'll get a patch up early tomorrow morning to address all of this.
As far as node exclusion policy goes is it reasonable to assume that
kResourceUnavailable will always refer to something local like too many open
sockets? I wouldn't want to prevent exclusion if it's the DN sending that.
> Retry reads on DN failure
> -------------------------
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Reporter: Bob Hansen
> Assignee: James Clampffer
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch,
> HDFS-9103.HDFS-8707.006.patch, HDFS-9103.HDFS-8707.3.patch,
> HDFS-9103.HDFS-8707.4.patch, HDFS-9103.HDFS-8707.5.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and
> try again.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)