[
https://issues.apache.org/jira/browse/HDFS-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288433#comment-14288433
]
Zhe Zhang commented on HDFS-7653:
---------------------------------
Thanks Bo for the patch. Please find my review below:
About the structure:
# Why do we need a {{UnifiedBlockReader}} interface, when {{BlockReader}}
contains all the required methods?
# {{BlockReader}}:
#* How do you plan to use {{ClientBlockReader}} in {{DFSInputStream}}? Will it
replace the current {{blockReader}}?
#* The purpose of a block reader is to read a single block from a single
DataNode. Instead of changing that logic, I think we need to start multiple
block readers and coordinate them, similar to the [design |
https://issues.apache.org/jira/secure/attachment/12687886/DataStripingSupportinHDFSClient.pdf]
in HDFS-7545.
# {{BlockWriter}}:
#* The {{FSOutputSummer}} class is for _file_ output streams. I don't think
it's appropriate as a block writer
#* In particular, as I mentioned in a [comment |
https://issues.apache.org/jira/browse/HDFS-7344?focusedCommentId=14273774&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14273774]
under HDFS-7344, I think block transfers between DataNodes can be much simpler
than client-side pipelines. Unless there's an obvious advantage of streaming
data to peer DataNodes in-the-fly, I don't think we should introduce the
complex write pipeline logic to DN.
# I think _what can be shared_ between client and DN is the logic to coordinate
multiple reading or writing threads.
#* We can refer to how [Reader |
https://github.com/quantcast/qfs/blob/master/src/cc/libclient/Reader.cc] and
[Writer |
https://github.com/quantcast/qfs/blob/master/src/cc/libclient/Writer.cc] share
the [RSSriper |
https://github.com/quantcast/qfs/blob/master/src/cc/libclient/RSStriper.cc]
logic in QFS.
#* To make things simpler we can even start from developing client and DN code
separately, and abstract out common logic later on.
Nits:
# {{ClientBlockReader#read(ByteBuffer buf)}}: what's the purpose of the
following code? When will a partially-used buffer be passed in?
{code}
int remaining = buf.remaining();
if(remaining <= 0)
return 0;
byte[] b = new byte[remaining];
int r = read(b, 0, b.length);
for(int i = 0; i < r; i++)
buf.put(b[i]);
{code}
# DatanodeBlockReaderTest:
#* Name should follow the convention and start with Test
#* It doesn't compile; need to change {{MiniDFSCluster.getBlockFile}} to
{{cluster.getBlockFile}}
> Block Readers and Writers used in both client side and datanode side
> --------------------------------------------------------------------
>
> Key: HDFS-7653
> URL: https://issues.apache.org/jira/browse/HDFS-7653
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Li Bo
> Assignee: Li Bo
> Attachments: BlockReadersWriters.patch
>
>
> There're a lot of block read/write operations in HDFS-EC, for example, when
> client writes a file in striping layout, client has to write several blocks
> to several different datanodes; if a datanode wants to do an
> encoding/decoding task, it has to read several blocks from itself and other
> datanodes, and writes one or more blocks to itself or other datanodes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)