[ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988349#comment-14988349
 ] 

James Clampffer commented on HDFS-9103:
---------------------------------------

Thanks for the feedback Haohui!

"The name_match function can be simplified. It looks like the comments are 
unnecessary."
Simplified with a ternary expression or similar?  I think I can get rid of that 
whole block of code if I change Entry to something that makes sense to have in 
a set.  I disagree about removing comments; they aren't really making things 
cluttered and make it easier understand the code quickly.  

"This function is on the critical path thus the result should be cached"
I considered this; I wanted to profile it with a handful of dead datanodes 
first before adding any complexity.  I'll add caching now (ideally to something 
that doesn't need to grab the lock).

"Looks like this is a test-only function. Let's name it "TEST_Clear()" for 
clarity."
Right now it's test only.  I figured eventually it wouldn't hurt to expose that 
in the C++ API but for now I'll rename to make things clear.

"The timeout settings needs to be passed in through the Option object. 
timeout_duration_ can be marked as const."
I totally missed this.  I was waiting on Bob's configuration parser and forgot 
that this was already available in the Options object.

"The clock needs to be a steady_clock for monotonicity"
Nice catch, didn't think about that.

"Are there any reasons of using a set<Entry> instead of a vector / map? Using 
vector has much better cache locality results in this case."
I was considering swapping out the std::pair for a struct that defined 
operator== and operator< based on node id so that the set semantics would make 
sense.  I should have done that for this patch, I'll add that to the next.  The 
set should be sitting in cache anyway if it's reasonably sized and accessed 
recently but if it turns cache misses are leading to pipeline stalls I'll 
certainly swap it out for a vector.

"The above code can be simplified."
Will do.

"The function can be placed in the InputStream class."
Good idea.  That gets rid of that block of code that's duplicated in the gmock 
test as well.

I'll get a patch up early tomorrow morning to address all of this.  

As far as node exclusion policy goes is it reasonable to assume that 
kResourceUnavailable will always refer to something local like too many open 
sockets?  I wouldn't want to prevent exclusion if it's the DN sending that. 



> Retry reads on DN failure
> -------------------------
>
>                 Key: HDFS-9103
>                 URL: https://issues.apache.org/jira/browse/HDFS-9103
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: Bob Hansen
>            Assignee: James Clampffer
>             Fix For: HDFS-8707
>
>         Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch, 
> HDFS-9103.HDFS-8707.006.patch, HDFS-9103.HDFS-8707.3.patch, 
> HDFS-9103.HDFS-8707.4.patch, HDFS-9103.HDFS-8707.5.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to