[ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651014#comment-14651014
 ] 

Walter Su commented on HDFS-8838:
---------------------------------

{code}
//DataStreamer#nextBlockOutputStream()
1464       lb = locateFollowingBlock(excluded.length > 0 ? excluded : null);
...
1475       success = createBlockOutputStream(nodes, storageTypes, 0L, false);
1477       if (!success) {...
1481         block = null;
1485    ...}
1486     } while (!success && --count >= 0);
{code}
Assume streamer #3 is bumping GS, and #4 is creating the next BlockOutputStream.
Assume #4 has polled from {{followingBlocks}}, and failed at line 1475. Then #4 
streamer.block = null (line 1481), then exits. 
The following code has infinite loop.
{code}
// StripedDataStreamer#updateBlockForPipeline()#populate()
// StripedDataStreamer#updatePipeline()#populate()
  for (int i = 0; i < NUM_DATA_BLOCKS + NUM_PARITY_BLOCKS; i++) {
    final StripedDataStreamer si = coordinator.getStripedDataStreamer(i);
    final ExtendedBlock bi = si.getBlock();
  if (bi != null) {...
  } else {...
    i--;
  }
{code}
Overall the logic looks good. Synchronization is difficult. It's a big 
contribution. Thanks very much.

> Tolerate datanode failures in DFSStripedOutputStream when the data length is 
> small
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-8838
>                 URL: https://issues.apache.org/jira/browse/HDFS-8838
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Tsz Wo Nicholas Sze
>         Attachments: h8838_20150729.patch, h8838_20150731.patch
>
>
> Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
> data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to