[ 
https://issues.apache.org/jira/browse/HDFS-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7782:
----------------------------
    Attachment: HDFS-7782-004.patch

The new patch has several updates:
# Disabled hedged read, leaving it as a TODO
# Added detailed comments for the class and key methods
# Added unit tests for {{planReadPortions}} and {{parseStripedBlockGroup}}
# Exposed parallel read for testing

I think we need to think more about how to configure parallel and hedged 
reading. There will be 4 combinations if we treat them as orthogonal config 
dimensions:
# *Non-hedged + non-parallel:* This is just the current serial pread 
implementation ({{fetchBlockByteRange}} in the patch)
# *Non-hedged + parallel:* This is the current parallel read implementation 
({{parallelFetchBlockByteRange}} in the patch)
# *Hedged  + parallel:* I imagine we'll extend {{parallelFetchBlockByteRange}} 
to start a new thread to read from a parity block if one of the original 
reading threads is slow.
# *Hedged + non-parallel:* If we follow the current hedged read logic, we 
should start a single thread reading a raw data cell. If it's slow, in order to 
replace it, we need to start another m threads (assuming a m+k schema), in 
which m-1 are for raw data and 1 is for parity.

Does the config space look too complex for users to understand? A possible 
alternative is to always read in parallel if the pread range spans multiple 
cells. Unlike hedged read, each parallel I/O thread is to read _requested_ 
data. So it doesn't increase total socket / bandwidth usage; it just causes a 
small bursty pattern. Then hedged vs non-hedged can be configured using the 
existing switch. 

> Read a striping layout file from client side
> --------------------------------------------
>
>                 Key: HDFS-7782
>                 URL: https://issues.apache.org/jira/browse/HDFS-7782
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Zhe Zhang
>         Attachments: HDFS-7782-000.patch, HDFS-7782-001.patch, 
> HDFS-7782-002.patch, HDFS-7782-003.patch, HDFS-7782-004.patch
>
>
> If client wants to read a file, he is not necessary to know and handle what 
> layout the file is. This sub task adds logic to DFSInputStream to support 
> reading striping layout files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to