[
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651014#comment-14651014
]
Walter Su commented on HDFS-8838:
---------------------------------
{code}
//DataStreamer#nextBlockOutputStream()
1464 lb = locateFollowingBlock(excluded.length > 0 ? excluded : null);
...
1475 success = createBlockOutputStream(nodes, storageTypes, 0L, false);
1477 if (!success) {...
1481 block = null;
1485 ...}
1486 } while (!success && --count >= 0);
{code}
Assume streamer #3 is bumping GS, and #4 is creating the next BlockOutputStream.
Assume #4 has polled from {{followingBlocks}}, and failed at line 1475. Then #4
streamer.block = null (line 1481), then exits.
The following code has infinite loop.
{code}
// StripedDataStreamer#updateBlockForPipeline()#populate()
// StripedDataStreamer#updatePipeline()#populate()
for (int i = 0; i < NUM_DATA_BLOCKS + NUM_PARITY_BLOCKS; i++) {
final StripedDataStreamer si = coordinator.getStripedDataStreamer(i);
final ExtendedBlock bi = si.getBlock();
if (bi != null) {...
} else {...
i--;
}
{code}
Overall the logic looks good. Synchronization is difficult. It's a big
contribution. Thanks very much.
> Tolerate datanode failures in DFSStripedOutputStream when the data length is
> small
> ----------------------------------------------------------------------------------
>
> Key: HDFS-8838
> URL: https://issues.apache.org/jira/browse/HDFS-8838
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Reporter: Tsz Wo Nicholas Sze
> Assignee: Tsz Wo Nicholas Sze
> Attachments: h8838_20150729.patch, h8838_20150731.patch
>
>
> Currently, DFSStripedOutputStream cannot tolerate datanode failures when the
> data length is small. We fix the bugs here and add more tests.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)