[jira] [Commented] (HDFS-8254) In StripedDataStreamer, it is hard to tolerate datanode failure in the leading streamer

Walter Su (JIRA) Fri, 29 May 2015 01:57:59 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564433#comment-14564433
 ]


Walter Su commented on HDFS-8254:
---------------------------------

This case passed.
{code}
  @Test(timeout=120000)
  public void testDatanodeFailure3() {
    final int length = NUM_DATA_BLOCKS*BLOCK_SIZE -1;
  ...
{code}

This case failed.
{code}
  @Test(timeout=120000)
  public void testDatanodeFailure3() {
    final int length = NUM_DATA_BLOCKS*BLOCK_SIZE;
  ...
{code}

Fix
{code}
  private long getCurrentSumBytes() {
    long sum = 0;
    for (int i = 0; i < numDataBlocks; i++) {
+      if(streamers.get(i).isFailed()){
+        continue;
+      }   
      System.out.println(streamers.get(i).getBytesCurBlock());
      sum += streamers.get(i).getBytesCurBlock();
    }   
    return sum;
  }
{code}

cause
{{BytesCurBlock}} of the failed streamer isn't 0. When last stripe is full. We 
call {{writeParityCells()}} twice.

To [~zhz]:
bq.  It also looks like we could run into a race condition if 2 streamers enter 
locateFollowingBlock around the same time? 
I think it won't be an issue. Cause MultipleBlockingQueue.poll(..) has 
{{synchronized(queues)}}

> In StripedDataStreamer, it is hard to tolerate datanode failure in the 
> leading streamer
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-8254
>                 URL: https://issues.apache.org/jira/browse/HDFS-8254
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Tsz Wo Nicholas Sze
>         Attachments: h8254_20150526.patch, h8254_20150526b.patch
>
>
> StripedDataStreamer javadoc is shown below.
> {code}
>  * The StripedDataStreamer class is used by {@link DFSStripedOutputStream}.
>  * There are two kinds of StripedDataStreamer, leading streamer and ordinary
>  * stream. Leading streamer requests a block group from NameNode, unwraps
>  * it to located blocks and transfers each located block to its corresponding
>  * ordinary streamer via a blocking queue.
> {code}
> Leading streamer is the streamer with index 0.  When the datanode of the 
> leading streamer fails, the other steamers cannot continue since no one will 
> request a block group from NameNode anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8254) In StripedDataStreamer, it is hard to tolerate datanode failure in the leading streamer

Reply via email to