[
https://issues.apache.org/jira/browse/HDFS-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377027#comment-14377027
]
Jing Zhao commented on HDFS-7782:
---------------------------------
Thanks for working on this, Zhe! Some early comments and questions:
# Looks like the current {{DFSStripedInputStream#hedgedFetchBlockByteRange}}
implementation is actually parallel reading instead of "hedged" read. "Hedged"
read means "if a read from a replica is slow, start up another parallel read
against a different block replica" to control the latency. For EC, without
considering reading parity data, we only read from all the DNs storing
different data blocks in parallel.
# We should try to avoid unnecessary data copy in the implementation. The
current patch reads data to temporary byte arrays first and later copies the
data into the given buffer. It will be better to directly read data into the
given byte array. You may need to extend {{getFromOneDataNode}} to achieve this
for parallel reading.
# Besides the current end-to-end tests in {{TestReadStripedFile}}, we need to
add more tests to make sure the calculation in {{planReadPortions}} and
{{parseStripedBlockGroup}} is correct in all different scenarios.
# I guess the read failure/timeout will be handled in a separate jira?
> Read a striping layout file from client side
> --------------------------------------------
>
> Key: HDFS-7782
> URL: https://issues.apache.org/jira/browse/HDFS-7782
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Li Bo
> Assignee: Zhe Zhang
> Attachments: HDFS-7782-000.patch, HDFS-7782-001.patch,
> HDFS-7782-002.patch, HDFS-7782-003.patch
>
>
> If client wants to read a file, he is not necessary to know and handle what
> layout the file is. This sub task adds logic to DFSInputStream to support
> reading striping layout files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)