[
https://issues.apache.org/jira/browse/HDFS-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301765#comment-14301765
]
Zhe Zhang commented on HDFS-7545:
---------------------------------
Thanks Bo for the good work! The high level logic looks good.
Nits:
# We should avoid unnecessarily moving lines, such as {{private long
currentSeqno = 0;}}
# Eclipse tries to automatically combine import entries, like below. We should
avoid introducing wildcard imports unnecessarily.
{code}
-import org.apache.hadoop.fs.CanSetDropBehind;
-import org.apache.hadoop.fs.CreateFlag;
-import org.apache.hadoop.fs.FSOutputSummer;
-import org.apache.hadoop.fs.FileAlreadyExistsException;
-import org.apache.hadoop.fs.FileEncryptionInfo;
-import org.apache.hadoop.fs.ParentNotDirectoryException;
-import org.apache.hadoop.fs.Syncable;
+import org.apache.hadoop.fs.*;
{code}
# {{stripeLayout}} should be {{stripedLayout}} or {{stripingLayout}}
# Typo: {{// if append , should read some packet ahean}}
# 3 blank lines at 2113~2115; double blank lines at other places.
# Some lines are too long (e.g. 2085). All lines should be within 80 chars.
Logics:
# Since the {{org.apache.hadoop.hdfs.ec}} package is still being developed,
let's use a "dummy" encode function in this patch, to make it functional (right
now it doesn't compile).
#* In cases like this, we can add a {{//TODO}} item with the JIRA number
# We need a unit testing class
# NameNode patch is already in. Could you update accordingly?
#* Client still gets a {{LocatedBlock}} from NN, whose ID is the first block ID
in the group.
#* From the above, the client calculates the IDs of remaining blocks in the
group
# {{writeChunk}} has the logic of striping input data into the list of
{{dataQueue}}s. This is fairly complex and we need to clearly document it in
the comments
# Is {{cellBuffers}} only used for parity calculation? It needs some
documentation too.
# [Question] How large is each {{Packet}} in {{DFSOutputStream}}? I haven't
read that part of code in detail.
# The below handles queuing parity data packets. Have you tested it? Seems to
me it will set {{currentPacket}} to {{null}} and we'll lose the last data
packet before the parity one. What's reason of using {{queueCurrentPacket}}
instead of {{waitAndQueueCurrentPacket}}?
{code}
if (curIdx == blockGroupDataBlocks) {
encode(cellBuffers);
for (int i = blockGroupDataBlocks; i < blockGroupSize; i++) {
ByteBuffer parityBuffer = cellBuffers[i];
parityBuffer.flip();
List<Packet> packets = generatePackets(parityBuffer);
for (Packet p : packets) {
currentPacket = p;
queueCurrentPacket();
}
{code}
> Data striping support in HDFS client
> ------------------------------------
>
> Key: HDFS-7545
> URL: https://issues.apache.org/jira/browse/HDFS-7545
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Zhe Zhang
> Assignee: Li Bo
> Attachments: DataStripingSupportinHDFSClient.pdf,
> HDFS-7545-001-DFSOutputStream.patch, HDFS-7545-PoC.patch, clientStriping.patch
>
>
> Data striping is a commonly used data layout with critical benefits in the
> context of erasure coding. This JIRA aims to extend HDFS client to work with
> striped blocks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)