[jira] [Updated] (HDFS-7729) Add logic to DFSOutputStream to support writing a file in striping layout

Zhe Zhang (JIRA) Tue, 10 Feb 2015 13:55:23 -0800

     [ 
https://issues.apache.org/jira/browse/HDFS-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Zhe Zhang updated HDFS-7729:
----------------------------
    Attachment: HDFS-7729-005.patch

Thanks Bo and this patch looks much better now! Client striping logic is a 
complex piece and I believe we are getting closer.

Logics:
# {{stripeBlocks}} is a key data structure.
#* I like the current {{BlockingQueue}}-based implementation. It's simple and 
handles the most basic scenario where streamers work with approximately the 
same rate.
#* There will be quite a bit of follow-on work to handle failures and slow 
writers.
#* We should probably bound the size of the blocking queues.
{code}
stripeBlocks[i] = new LinkedBlockingQueue<LocatedBlock>();
{code}
#* We should avoid repeating the {{addBlock}} logic. Maybe we should make 
{{nextBlockOutputStream}} work for both contiguous and striped blocks. I've 
attached a patch to demo the thoughts; please let me know if it looks OK. It 
also has some other detailed changes.
# {{blocksForUnitTest}} can be obtained via an RPC call. See example below:
{code}
          List<LocatedBlock> locatedBlocks = 
              cluster.getNameNode().getRpcServer().getBlockLocations(
              TEST_FILE, 0, TEST_FILE_LEN).getLocatedBlocks();
{code}
# The following variables are moved to {{DataStreamer}}. But they are only 
accessed in the outer {{DFSOutputStream}} class. I think they should still be 
under {{DFSOutputStream}}, but converted to arrays?
{code}
    private long currentSeqno = 0;
    private long lastQueuedSeqno = -1;
    private long lastAckedSeqno = -1;
    private long bytesCurBlock = 0; 
{code}
# {{writeChunk}} is another key method
#* How does the following handle crossing cell boundaries? What if 
{{sizeOfCellInBuffer}} is larger than {{cellSize}}?
{code}
      addToCellBuffer(b, offset, len);
      if (sizeOfCellInBuffer[curIdx] == cellSize) {
{code}
#* Right now we need to handle both _cell full_ and _packet full_ conditions. 
I'm thinking maybe we should unify cell size and packet size in this phase. We 
can make cell size configurable as a follow-on task.
# {{TestDFSOutputStreamStripingLayout}}
#* It should use {{@Before}} and {{@After}} classes like other unit tests
#* I tried adding a multi-group test and it didn't work 
(ArrayIndexOutOfBoundsException)

Nits:
# We usually use 2 spaces to indent. It seems your IDE uses 4 spaces.
# Let's avoid bracket-less statements (see Apple's "[goto bug | 
http://www.wired.com/2014/02/gotofail/]";)
{code}
          for(int k = 0; k < blockGroupDataBlocks; k++)
            cellBuffers[k].flip();
{code}

> Add logic to DFSOutputStream to support writing a file in striping layout 
> --------------------------------------------------------------------------
>
>                 Key: HDFS-7729
>                 URL: https://issues.apache.org/jira/browse/HDFS-7729
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Li Bo
>         Attachments: Codec-tmp.patch, HDFS-7729-001.patch, 
> HDFS-7729-002.patch, HDFS-7729-003.patch, HDFS-7729-004.patch, 
> HDFS-7729-005.patch
>
>
> If client wants to directly write a file striping layout, we need to add some 
> logic to DFSOutputStream.  DFSOutputStream needs multiple DataStreamers to 
> write each cell of a stripe to a remote datanode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7729) Add logic to DFSOutputStream to support writing a file in striping layout

Reply via email to