[ 
https://issues.apache.org/jira/browse/HDFS-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301765#comment-14301765
 ] 

Zhe Zhang commented on HDFS-7545:
---------------------------------

Thanks Bo for the good work! The high level logic looks good.

Nits:
# We should avoid unnecessarily moving lines, such as {{private long 
currentSeqno = 0;}}
# Eclipse tries to automatically combine import entries, like below. We should 
avoid introducing wildcard imports unnecessarily.
{code}
-import org.apache.hadoop.fs.CanSetDropBehind;
-import org.apache.hadoop.fs.CreateFlag;
-import org.apache.hadoop.fs.FSOutputSummer;
-import org.apache.hadoop.fs.FileAlreadyExistsException;
-import org.apache.hadoop.fs.FileEncryptionInfo;
-import org.apache.hadoop.fs.ParentNotDirectoryException;
-import org.apache.hadoop.fs.Syncable;
+import org.apache.hadoop.fs.*;
{code}
# {{stripeLayout}} should be {{stripedLayout}} or {{stripingLayout}}
# Typo: {{// if append , should read some packet ahean}}
# 3 blank lines at 2113~2115; double blank lines at other places.
# Some lines are too long (e.g. 2085). All lines should be within 80 chars.

Logics:
# Since the {{org.apache.hadoop.hdfs.ec}} package is still being developed, 
let's use a "dummy" encode function in this patch, to make it functional (right 
now it doesn't compile).
#* In cases like this, we can add a {{//TODO}} item with the JIRA number
# We need a unit testing class
# NameNode patch is already in. Could you update accordingly?
#* Client still gets a {{LocatedBlock}} from NN, whose ID is the first block ID 
in the group.
#* From the above, the client calculates the IDs of remaining blocks in the 
group
# {{writeChunk}} has the logic of striping input data into the list of 
{{dataQueue}}s. This is fairly complex and we need to clearly document it in 
the comments
# Is {{cellBuffers}} only used for parity calculation? It needs some 
documentation too.
# [Question] How large is each {{Packet}} in {{DFSOutputStream}}? I haven't 
read that part of code in detail.
# The below handles queuing parity data packets. Have you tested it? Seems to 
me it will set {{currentPacket}} to {{null}} and we'll lose the last data 
packet before the parity one. What's reason of using {{queueCurrentPacket}} 
instead of {{waitAndQueueCurrentPacket}}?
{code}
        if (curIdx == blockGroupDataBlocks) {
          encode(cellBuffers);
          for (int i = blockGroupDataBlocks; i < blockGroupSize; i++) {
            ByteBuffer parityBuffer = cellBuffers[i];
            parityBuffer.flip();
            List<Packet> packets = generatePackets(parityBuffer);
            for (Packet p : packets) {
              currentPacket = p;
              queueCurrentPacket();
            }
{code}


> Data striping support in HDFS client
> ------------------------------------
>
>                 Key: HDFS-7545
>                 URL: https://issues.apache.org/jira/browse/HDFS-7545
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Li Bo
>         Attachments: DataStripingSupportinHDFSClient.pdf, 
> HDFS-7545-001-DFSOutputStream.patch, HDFS-7545-PoC.patch, clientStriping.patch
>
>
> Data striping is a commonly used data layout with critical benefits in the 
> context of erasure coding. This JIRA aims to extend HDFS client to work with 
> striped blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to