[
https://issues.apache.org/jira/browse/HDFS-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14524652#comment-14524652
]
Zhe Zhang commented on HDFS-7678:
---------------------------------
Thanks Andrew for quality review! Attaching 007 patch based on the comments.
# bq. The importance of the largest read portion is not explained,
Nice drawing! We need a lot of those for striping :)
# bq. Related, I could see printing the "plan" like this to be nice for
debugging. Also a nice illustration for the test cases.
As briefly described in this [comment |
https://issues.apache.org/jira/browse/HDFS-8281?focusedCommentId=14522770&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14522770],
I plan to file another JIRA focusing on formalize the concepts / abstractions
around block striping, and improve code readability. I like the idea proposed
above; will add a util method to print read portions.
# {{deadNodes}}: Actually this info is less helpful in striping reads. In
contiguous read when a node is in {{deadNodes}} we can switch to another
replica with no cost. Here, retrying this previously-marked-dead node (which
could come back from transient failure) is pretty cheap compared to starting a
round of recovery read. Of course if we want to favor latency over bandwidth
usage we can start recovery reads wihout trying {{deadNodes}}. Let me know
which option you think is better.
# overall timeout handling: {{stripedReadThresholdMillis}} may seem to be an
overall timeout, but it is actually a per-op timeout considering that all
thread are issued at the same time ({{startTime}}). Each task is given this
{{stripedReadThresholdMillis}} amount of time to finish. If we measure the
delta from task_1's completion to task_2's completion, and place a timeout on
this delta, it doesn't seem to be a fair policy.
# bq. We have this function
Splitting {{fetchBlockByteRange}} is a great idea. Seems you were trying to say
something more in this bullet?
# There's no abstraction here for cells or stripes, we're mucking directly with
internal blocks and byte ranges.
The "formalization" JIRA I mentioned above will address this issue. In this
particular {{fetchBlockByteRange}} case it won't be too straightforward; we
used to read stripe by stripe and then changed to sending a single block read
request to each DN so we create blockReader only once for each DN. It will be
an interesting design question to think about.
# The current thread pool is to some extent "inherited" from hedged read code.
Agreed that we want to limit per-DN concurrency; but that requires coordination
among other clients? Also, don't we want to limit per-client concurrency too?
> Erasure coding: DFSInputStream with decode functionality
> --------------------------------------------------------
>
> Key: HDFS-7678
> URL: https://issues.apache.org/jira/browse/HDFS-7678
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: HDFS-7285
> Reporter: Li Bo
> Assignee: Zhe Zhang
> Attachments: BlockGroupReader.patch, HDFS-7678-HDFS-7285.002.patch,
> HDFS-7678-HDFS-7285.003.patch, HDFS-7678-HDFS-7285.004.patch,
> HDFS-7678-HDFS-7285.005.patch, HDFS-7678-HDFS-7285.006.patch,
> HDFS-7678-HDFS-7285.007.patch, HDFS-7678.000.patch, HDFS-7678.001.patch
>
>
> A block group reader will read data from BlockGroup no matter in striping
> layout or contiguous layout. The corrupt blocks can be known before
> reading(told by namenode), or just be found during reading. The block group
> reader needs to do decoding work when some blocks are found corrupt.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)