[
https://issues.apache.org/jira/browse/HDFS-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhe Zhang updated HDFS-7729:
----------------------------
Attachment: HDFS-7729-005.patch
Thanks Bo and this patch looks much better now! Client striping logic is a
complex piece and I believe we are getting closer.
Logics:
# {{stripeBlocks}} is a key data structure.
#* I like the current {{BlockingQueue}}-based implementation. It's simple and
handles the most basic scenario where streamers work with approximately the
same rate.
#* There will be quite a bit of follow-on work to handle failures and slow
writers.
#* We should probably bound the size of the blocking queues.
{code}
stripeBlocks[i] = new LinkedBlockingQueue<LocatedBlock>();
{code}
#* We should avoid repeating the {{addBlock}} logic. Maybe we should make
{{nextBlockOutputStream}} work for both contiguous and striped blocks. I've
attached a patch to demo the thoughts; please let me know if it looks OK. It
also has some other detailed changes.
# {{blocksForUnitTest}} can be obtained via an RPC call. See example below:
{code}
List<LocatedBlock> locatedBlocks =
cluster.getNameNode().getRpcServer().getBlockLocations(
TEST_FILE, 0, TEST_FILE_LEN).getLocatedBlocks();
{code}
# The following variables are moved to {{DataStreamer}}. But they are only
accessed in the outer {{DFSOutputStream}} class. I think they should still be
under {{DFSOutputStream}}, but converted to arrays?
{code}
private long currentSeqno = 0;
private long lastQueuedSeqno = -1;
private long lastAckedSeqno = -1;
private long bytesCurBlock = 0;
{code}
# {{writeChunk}} is another key method
#* How does the following handle crossing cell boundaries? What if
{{sizeOfCellInBuffer}} is larger than {{cellSize}}?
{code}
addToCellBuffer(b, offset, len);
if (sizeOfCellInBuffer[curIdx] == cellSize) {
{code}
#* Right now we need to handle both _cell full_ and _packet full_ conditions.
I'm thinking maybe we should unify cell size and packet size in this phase. We
can make cell size configurable as a follow-on task.
# {{TestDFSOutputStreamStripingLayout}}
#* It should use {{@Before}} and {{@After}} classes like other unit tests
#* I tried adding a multi-group test and it didn't work
(ArrayIndexOutOfBoundsException)
Nits:
# We usually use 2 spaces to indent. It seems your IDE uses 4 spaces.
# Let's avoid bracket-less statements (see Apple's "[goto bug |
http://www.wired.com/2014/02/gotofail/]")
{code}
for(int k = 0; k < blockGroupDataBlocks; k++)
cellBuffers[k].flip();
{code}
> Add logic to DFSOutputStream to support writing a file in striping layout
> --------------------------------------------------------------------------
>
> Key: HDFS-7729
> URL: https://issues.apache.org/jira/browse/HDFS-7729
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Li Bo
> Assignee: Li Bo
> Attachments: Codec-tmp.patch, HDFS-7729-001.patch,
> HDFS-7729-002.patch, HDFS-7729-003.patch, HDFS-7729-004.patch,
> HDFS-7729-005.patch
>
>
> If client wants to directly write a file striping layout, we need to add some
> logic to DFSOutputStream. DFSOutputStream needs multiple DataStreamers to
> write each cell of a stripe to a remote datanode.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)