[jira] [Commented] (HDFS-7653) Block Readers and Writers used in both client side and datanode side

Zhe Zhang (JIRA) Thu, 22 Jan 2015 15:22:49 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288433#comment-14288433
 ]


Zhe Zhang commented on HDFS-7653:
---------------------------------

Thanks Bo for the patch. Please find my review below:

About the structure:
# Why do we need a {{UnifiedBlockReader}} interface, when {{BlockReader}} 
contains all the required methods?
# {{BlockReader}}:
#* How do you plan to use {{ClientBlockReader}} in {{DFSInputStream}}? Will it 
replace the current {{blockReader}}?
#* The purpose of a block reader is to read a single block from a single 
DataNode. Instead of changing that logic, I think we need to start multiple 
block readers and coordinate them, similar to the [design | 
https://issues.apache.org/jira/secure/attachment/12687886/DataStripingSupportinHDFSClient.pdf]
 in HDFS-7545.
# {{BlockWriter}}:
#* The {{FSOutputSummer}} class is for _file_ output streams. I don't think 
it's appropriate as a block writer
#* In particular, as I mentioned in a [comment | 
https://issues.apache.org/jira/browse/HDFS-7344?focusedCommentId=14273774&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14273774]
 under HDFS-7344, I think block transfers between DataNodes can be much simpler 
than client-side pipelines. Unless there's an obvious advantage of streaming 
data to peer DataNodes in-the-fly, I don't think we should introduce the 
complex write pipeline logic to DN.
# I think _what can be shared_ between client and DN is the logic to coordinate 
multiple reading or writing threads.
#* We can refer to how [Reader | 
https://github.com/quantcast/qfs/blob/master/src/cc/libclient/Reader.cc] and 
[Writer | 
https://github.com/quantcast/qfs/blob/master/src/cc/libclient/Writer.cc] share 
the [RSSriper | 
https://github.com/quantcast/qfs/blob/master/src/cc/libclient/RSStriper.cc] 
logic in QFS. 
#* To make things simpler we can even start from developing client and DN code 
separately, and abstract out common logic later on.

Nits:
# {{ClientBlockReader#read(ByteBuffer buf)}}: what's the purpose of the 
following code? When will a partially-used buffer be passed in?
{code}
    int remaining = buf.remaining();
    if(remaining <= 0)
      return 0;
    byte[] b = new byte[remaining];
    int r = read(b, 0, b.length);
    for(int i = 0; i < r; i++)
      buf.put(b[i]);
{code}
# DatanodeBlockReaderTest:
#* Name should follow the convention and start with Test
#* It doesn't compile; need to change {{MiniDFSCluster.getBlockFile}} to 
{{cluster.getBlockFile}}

> Block Readers and Writers used in both client side and datanode side
> --------------------------------------------------------------------
>
>                 Key: HDFS-7653
>                 URL: https://issues.apache.org/jira/browse/HDFS-7653
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Li Bo
>         Attachments: BlockReadersWriters.patch
>
>
> There're a lot of block read/write operations in HDFS-EC, for example, when 
> client writes a file in striping layout, client has to write several blocks 
> to several different datanodes; if a datanode wants to do an 
> encoding/decoding task, it has to read several blocks from itself and other 
> datanodes, and writes one or more blocks to itself or other datanodes.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7653) Block Readers and Writers used in both client side and datanode side

Reply via email to