[
https://issues.apache.org/jira/browse/HDFS-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14524353#comment-14524353
]
Andrew Wang commented on HDFS-7678:
-----------------------------------
Nits / comment requests:
* extra imports in DFSStripedInputStream
* I noticed these config keys aren't present in hdfs-default.xml, which is
where we document the behavior of config keys. I bring it up because I was
thinking about the semantic difference between "overall timeout" vs. "per-op
timeout". Since there are a couple EC keys that need documentation, I'm okay
punting this to a quick follow-on.
* These same keys could also use javadoc on the DFSClientConf getters so we
don't need to hunt in hdfs-default.xml or look at how the code uses the configs
* The importance of the largest read portion is not explained, nor any large
comment about the overall flow of the recovery logic. Some ascii art would help
clarify all this, I drew something like this:
{noformat}
+--+ +--+ | +--+ +--+
+--+ | | | | | | | | |
| | | | +--+ | | | | |
+--+ +--+ | +--+ +--+
|
d1 d2 d3 | p1 p2
{noformat}
This way it becomes obvious, if d1 fails, you need to lengthen the read to d3
in addition to grabbing the parity. The TODO to optimize the read also is
clarified; we always calculate the max span and always do max spans for
recovery, even though we might only need the part of d3 that's unfetched.
Related, I could see printing the "plan" like this to be nice for debugging.
Also a nice illustration for the test cases.
Code:
* AFAICT we don't respect deadNodes in scheduleOneStripedRead, or during the
initial planning phase. I mentioned this a bit in person, but we also need to
think about the dead node marking policy for EC reads, since the timeout is
aggressive.
* I don't understand the overall timeout handling, since if the overall timeout
has elapsed, we still go through the switch and schedule recovery reads. Seems
like something we should unit test.
* We also probably still want a per-operation timeout in addition to the
overall timeout. If the per-op timeout hits, we keep trying to recover, until
the overall timeout runs out (or we run out of blocks).
* The InterruptedException catch should probably rethrow an
InterruptedIOException or similar, the idea is to get us out of this function
ASAP while cleaning up state.
Recommended follow-on work:
* fetchBlockByteRange is an outsize function and could use some splitting. We
have this function
* There's no abstraction here for cells or stripes, we're mucking directly with
internal blocks and byte ranges. IMO all the logic should happen on these
higher-level abstractions, then we turn it into blocks and byte ranges after.
Max-span for instance is a hack, we should be calculating recovery reads
cell-by-cell based on what we already have, then combining the cells for an
internal block together to form the actual read.
* One thought regarding the threadpool. What we really want is to limit the
concurrency per-DN, i.e. a threadpool per DN. This works nicely with read
combining mentioned above, and does one better since it can coalesce recovery
reads with already queued work. This is like how disk IO queues work.
> Erasure coding: DFSInputStream with decode functionality
> --------------------------------------------------------
>
> Key: HDFS-7678
> URL: https://issues.apache.org/jira/browse/HDFS-7678
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: HDFS-7285
> Reporter: Li Bo
> Assignee: Zhe Zhang
> Attachments: BlockGroupReader.patch, HDFS-7678-HDFS-7285.002.patch,
> HDFS-7678-HDFS-7285.003.patch, HDFS-7678-HDFS-7285.004.patch,
> HDFS-7678-HDFS-7285.005.patch, HDFS-7678-HDFS-7285.006.patch,
> HDFS-7678.000.patch, HDFS-7678.001.patch
>
>
> A block group reader will read data from BlockGroup no matter in striping
> layout or contiguous layout. The corrupt blocks can be known before
> reading(told by namenode), or just be found during reading. The block group
> reader needs to do decoding work when some blocks are found corrupt.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)