[
https://issues.apache.org/jira/browse/HDFS-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhe Zhang updated HDFS-7782:
----------------------------
Attachment: HDFS-7782-006.patch
Thanks Jing and Walter for the thorough reviews!
bq. MiniDFSCluster.injectBlocks(..) filled block with repeated byte
DEFAULT_DATABYTE
Yes this bothers me as well. I will file another JIRA to extend
{{SimulatedFSDataset}}.
bq. I'm a little confused where each groupSize comes from.
This is basically the width of the striping group. Currently we don't have
configurable schema yet, so it should always be NUM_DATA_BLOCKS.
bq. In DFSClient#initThreadsNumForStripedReads, the DFSClient object's monitor
cannot protect the static filed STRIPED_READ_THREAD_POOL.
Good catch! I changed both {{STRIPED_READ_THREAD_POOL}} and
{{HEDGED_READ_THREAD_POOL}} to be non-static. They are never accessed in a
static way.
bq. nstead of overriding hedgedFetchBlockByteRange and throwing
UnsupportedActionException, maybe we can add a check in DFSInputStream#pread to
make sure no hedged read for a LocatedStripedBlock.
It's a good point that we should allow other contiguous input streams under the
same {{DFSClient}} to enable hedged read. I just updated the code to print a
WARN message instead of throwing an exception, and then go on with non-hedged
read. If we change {{DFSInputStream#pread}}, I guess we should also print this
WARN message instead of throwing an exception?
About directly using the provided buffer, I agree with the analysis from Jing
and Walter. Right now I'm using the simpler option, which is to issue a task
for each cell. This has the disadvantage of creating block reader multiple
times for each DN, when read size is large. I don't see an easy way to avoid
that, except for duplicating much of the {{actualGetFromOneDataNode}} code, or
change it directly in {{DFSInputStream}}. Maybe we should leave this
optimization as a follow-on.
> Erasure coding: pread from files in striped layout
> --------------------------------------------------
>
> Key: HDFS-7782
> URL: https://issues.apache.org/jira/browse/HDFS-7782
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Li Bo
> Assignee: Zhe Zhang
> Attachments: HDFS-7782-000.patch, HDFS-7782-001.patch,
> HDFS-7782-002.patch, HDFS-7782-003.patch, HDFS-7782-004.patch,
> HDFS-7782-005.patch, HDFS-7782-006.patch
>
>
> If client wants to read a file, he is not necessary to know and handle what
> layout the file is. This sub task adds logic to DFSInputStream to support
> reading striping layout files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)