[ 
https://issues.apache.org/jira/browse/HDFS-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7782:
----------------------------
    Attachment: HDFS-7782-006.patch

Thanks Jing and Walter for the thorough reviews!

bq. MiniDFSCluster.injectBlocks(..) filled block with repeated byte 
DEFAULT_DATABYTE
Yes this bothers me as well. I will file another JIRA to extend 
{{SimulatedFSDataset}}.

bq. I'm a little confused where each groupSize comes from.
This is basically the width of the striping group. Currently we don't have 
configurable schema yet, so it should always be NUM_DATA_BLOCKS.

bq. In DFSClient#initThreadsNumForStripedReads, the DFSClient object's monitor 
cannot protect the static filed STRIPED_READ_THREAD_POOL.
Good catch! I changed both {{STRIPED_READ_THREAD_POOL}} and 
{{HEDGED_READ_THREAD_POOL}} to be non-static. They are never accessed in a 
static way.

bq. nstead of overriding hedgedFetchBlockByteRange and throwing 
UnsupportedActionException, maybe we can add a check in DFSInputStream#pread to 
make sure no hedged read for a LocatedStripedBlock.
It's a good point that we should allow other contiguous input streams under the 
same {{DFSClient}} to enable hedged read. I just updated the code to print a 
WARN message instead of throwing an exception, and then go on with non-hedged 
read. If we change {{DFSInputStream#pread}}, I guess we should also print this 
WARN message instead of throwing an exception?

About directly using the provided buffer, I agree with the analysis from Jing 
and Walter. Right now I'm using the simpler option, which is to issue a task 
for each cell. This has the disadvantage of creating block reader multiple 
times for each DN, when read size is large. I don't see an easy way to avoid 
that, except for duplicating much of the {{actualGetFromOneDataNode}} code, or 
change it directly in {{DFSInputStream}}. Maybe we should leave this 
optimization as a follow-on.

> Erasure coding: pread from files in striped layout
> --------------------------------------------------
>
>                 Key: HDFS-7782
>                 URL: https://issues.apache.org/jira/browse/HDFS-7782
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Zhe Zhang
>         Attachments: HDFS-7782-000.patch, HDFS-7782-001.patch, 
> HDFS-7782-002.patch, HDFS-7782-003.patch, HDFS-7782-004.patch, 
> HDFS-7782-005.patch, HDFS-7782-006.patch
>
>
> If client wants to read a file, he is not necessary to know and handle what 
> layout the file is. This sub task adds logic to DFSInputStream to support 
> reading striping layout files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to