[ https://issues.apache.org/jira/browse/HDFS-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhe Zhang updated HDFS-7782: ---------------------------- Attachment: HDFS-7782-006.patch Thanks Jing and Walter for the thorough reviews! bq. MiniDFSCluster.injectBlocks(..) filled block with repeated byte DEFAULT_DATABYTE Yes this bothers me as well. I will file another JIRA to extend {{SimulatedFSDataset}}. bq. I'm a little confused where each groupSize comes from. This is basically the width of the striping group. Currently we don't have configurable schema yet, so it should always be NUM_DATA_BLOCKS. bq. In DFSClient#initThreadsNumForStripedReads, the DFSClient object's monitor cannot protect the static filed STRIPED_READ_THREAD_POOL. Good catch! I changed both {{STRIPED_READ_THREAD_POOL}} and {{HEDGED_READ_THREAD_POOL}} to be non-static. They are never accessed in a static way. bq. nstead of overriding hedgedFetchBlockByteRange and throwing UnsupportedActionException, maybe we can add a check in DFSInputStream#pread to make sure no hedged read for a LocatedStripedBlock. It's a good point that we should allow other contiguous input streams under the same {{DFSClient}} to enable hedged read. I just updated the code to print a WARN message instead of throwing an exception, and then go on with non-hedged read. If we change {{DFSInputStream#pread}}, I guess we should also print this WARN message instead of throwing an exception? About directly using the provided buffer, I agree with the analysis from Jing and Walter. Right now I'm using the simpler option, which is to issue a task for each cell. This has the disadvantage of creating block reader multiple times for each DN, when read size is large. I don't see an easy way to avoid that, except for duplicating much of the {{actualGetFromOneDataNode}} code, or change it directly in {{DFSInputStream}}. Maybe we should leave this optimization as a follow-on. > Erasure coding: pread from files in striped layout > -------------------------------------------------- > > Key: HDFS-7782 > URL: https://issues.apache.org/jira/browse/HDFS-7782 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Li Bo > Assignee: Zhe Zhang > Attachments: HDFS-7782-000.patch, HDFS-7782-001.patch, > HDFS-7782-002.patch, HDFS-7782-003.patch, HDFS-7782-004.patch, > HDFS-7782-005.patch, HDFS-7782-006.patch > > > If client wants to read a file, he is not necessary to know and handle what > layout the file is. This sub task adds logic to DFSInputStream to support > reading striping layout files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)