[ 
https://issues.apache.org/jira/browse/HDFS-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14953652#comment-14953652
 ] 

Zhe Zhang commented on HDFS-9079:
---------------------------------

Thanks for the comments Walter!

It's a very good point that the current patch doesn't handle failures of the 
streamer threads. Since the change is already quite large, maybe we can leave 
that as a separate JIRA, if we at least agree on the basic direction of this 
JIRA? I'll try to rev the patch to complete the handling of DN failures, and 
try to add some basic handling of streamer thread failures.

I'm currently debugging the patch against 
{{TestDFSStripedOutputStreamWithFailure}}. I think the logic of allocating 
multiple genStamps goes against some assumptions in {{runTest}}. Whenever I run 
a single configuration of the below parameter set the test passes (e.g., if I 
change {{runTestWithMultipleFailure}} to only test a single entry in 
{{dnIndexSuite}}). But for multiple configurations it fails.
{code}
private void runTest(final int length, final int[] killPos,
      final int[] dnIndex, final boolean tokenExpire) throws Exception {
{code}

> Erasure coding: preallocate multiple generation stamps and serialize updates 
> from data streamers
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9079
>                 URL: https://issues.apache.org/jira/browse/HDFS-9079
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: HDFS-7285
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-9079-HDFS-7285.00.patch, HDFS-9079.01.patch
>
>
> A non-striped DataStreamer goes through the following steps in error handling:
> {code}
> 1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4) 
> Applies new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6) 
> Updates block on NN
> {code}
> To simplify the above we can preallocate GS when NN creates a new striped 
> block group ({{FSN#createNewBlock}}). For each new striped block group we can 
> reserve {{NUM_PARITY_BLOCKS}} GS's. Then steps 1~3 in the above sequence can 
> be saved. If more than {{NUM_PARITY_BLOCKS}} errors have happened we 
> shouldn't try to further recover anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to