[
https://issues.apache.org/jira/browse/HDFS-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564433#comment-14564433
]
Walter Su commented on HDFS-8254:
---------------------------------
This case passed.
{code}
@Test(timeout=120000)
public void testDatanodeFailure3() {
final int length = NUM_DATA_BLOCKS*BLOCK_SIZE -1;
...
{code}
This case failed.
{code}
@Test(timeout=120000)
public void testDatanodeFailure3() {
final int length = NUM_DATA_BLOCKS*BLOCK_SIZE;
...
{code}
Fix
{code}
private long getCurrentSumBytes() {
long sum = 0;
for (int i = 0; i < numDataBlocks; i++) {
+ if(streamers.get(i).isFailed()){
+ continue;
+ }
System.out.println(streamers.get(i).getBytesCurBlock());
sum += streamers.get(i).getBytesCurBlock();
}
return sum;
}
{code}
cause
{{BytesCurBlock}} of the failed streamer isn't 0. When last stripe is full. We
call {{writeParityCells()}} twice.
To [~zhz]:
bq. It also looks like we could run into a race condition if 2 streamers enter
locateFollowingBlock around the same time?
I think it won't be an issue. Cause MultipleBlockingQueue.poll(..) has
{{synchronized(queues)}}
> In StripedDataStreamer, it is hard to tolerate datanode failure in the
> leading streamer
> ---------------------------------------------------------------------------------------
>
> Key: HDFS-8254
> URL: https://issues.apache.org/jira/browse/HDFS-8254
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Tsz Wo Nicholas Sze
> Assignee: Tsz Wo Nicholas Sze
> Attachments: h8254_20150526.patch, h8254_20150526b.patch
>
>
> StripedDataStreamer javadoc is shown below.
> {code}
> * The StripedDataStreamer class is used by {@link DFSStripedOutputStream}.
> * There are two kinds of StripedDataStreamer, leading streamer and ordinary
> * stream. Leading streamer requests a block group from NameNode, unwraps
> * it to located blocks and transfers each located block to its corresponding
> * ordinary streamer via a blocking queue.
> {code}
> Leading streamer is the streamer with index 0. When the datanode of the
> leading streamer fails, the other steamers cannot continue since no one will
> request a block group from NameNode anymore.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)