[ 
https://issues.apache.org/jira/browse/HDFS-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14524652#comment-14524652
 ] 

Zhe Zhang commented on HDFS-7678:
---------------------------------

Thanks Andrew for quality review! Attaching 007 patch based on the comments.

# bq. The importance of the largest read portion is not explained, 
Nice drawing! We need a lot of those for striping :)
# bq. Related, I could see printing the "plan" like this to be nice for 
debugging. Also a nice illustration for the test cases.
As briefly described in this [comment | 
https://issues.apache.org/jira/browse/HDFS-8281?focusedCommentId=14522770&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14522770],
 I plan to file another JIRA focusing on formalize the concepts / abstractions 
around block striping, and improve code readability. I like the idea proposed 
above; will add a util method to print read portions.
# {{deadNodes}}: Actually this info is less helpful in striping reads. In 
contiguous read when a node is in {{deadNodes}} we can switch to another 
replica with no cost. Here, retrying this previously-marked-dead node (which 
could come back from transient failure) is pretty cheap compared to starting a 
round of recovery read. Of course if we want to favor latency over bandwidth 
usage we can start recovery reads wihout trying {{deadNodes}}. Let me know 
which option you think is better.
# overall timeout handling: {{stripedReadThresholdMillis}} may seem to be an 
overall timeout, but it is actually a per-op timeout considering that all 
thread are issued at the same time ({{startTime}}). Each task is given this 
{{stripedReadThresholdMillis}} amount of time to finish. If we measure the 
delta from task_1's completion to task_2's completion, and place a timeout on 
this delta, it doesn't seem to be a fair policy.
#  bq.  We have this function
Splitting {{fetchBlockByteRange}} is a great idea. Seems you were trying to say 
something more in this bullet?
# There's no abstraction here for cells or stripes, we're mucking directly with 
internal blocks and byte ranges.
The "formalization" JIRA I mentioned above will address this issue. In this 
particular {{fetchBlockByteRange}} case it won't be too straightforward; we 
used to read stripe by stripe and then changed to sending a single block read 
request to each DN so we create blockReader only once for each DN. It will be 
an interesting design question to think about.
# The current thread pool is to some extent "inherited" from hedged read code. 
Agreed that we want to limit per-DN concurrency; but that requires coordination 
among other clients? Also, don't we want to limit per-client concurrency too?

> Erasure coding: DFSInputStream with decode functionality
> --------------------------------------------------------
>
>                 Key: HDFS-7678
>                 URL: https://issues.apache.org/jira/browse/HDFS-7678
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: HDFS-7285
>            Reporter: Li Bo
>            Assignee: Zhe Zhang
>         Attachments: BlockGroupReader.patch, HDFS-7678-HDFS-7285.002.patch, 
> HDFS-7678-HDFS-7285.003.patch, HDFS-7678-HDFS-7285.004.patch, 
> HDFS-7678-HDFS-7285.005.patch, HDFS-7678-HDFS-7285.006.patch, 
> HDFS-7678-HDFS-7285.007.patch, HDFS-7678.000.patch, HDFS-7678.001.patch
>
>
> A block group reader will read data from BlockGroup no matter in striping 
> layout or contiguous layout. The corrupt blocks can be known before 
> reading(told by namenode), or just be found during reading. The block group 
> reader needs to do decoding work when some blocks are found corrupt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to