[ 
https://issues.apache.org/jira/browse/HDFS-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14910134#comment-14910134
 ] 

Zhe Zhang commented on HDFS-9079:
---------------------------------

Thanks for the helpful comment Walter.

bq. setupPipelineForAppendOrRecovery() will trim bad nodes. When 
nodes.length==0, the failed streamer won't call updateBlockForPipeline(). 
That's one reason you need HDFS-9040.
Agreed, the overridden {{updatePipelineInternal}} logic in HDFS-9040 will 
address this issue. As explained above this patch will be rebased on top of 
HDFS-9040 once HDFS-9040 is committed.

bq. The new way delays updatePipeline. One failure doesn't call it, only 
endBlock() will.
The code segment in {{case DN_ACCEPT_GS}} also updates the NN copy of the block 
(storedBlock). The protocol is that once all healthy DNs accept the proposed 
GS, we update NN (also described [here | 
https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741972].
 This guarantees no "false-stale", meaning a fresh internal block will never be 
considered stale. But fundamentally it's hard to prevent "false-fresh". We can 
only try to shorten the window where a stale internal block can be considered 
fresh.

bq. Assume client gets killed before endBlock(). Now every blocks can be 
accepted by blockReport. It affects lease recovery's judgement. 
My plan is to bump the GS of NN's storedBlock to 1004 (1001+NUM_PARITY_BLOCKS) 
in lease recovery. A healthy streamer also bumps the GS of its internal block 
(DN's copy of GS) to 1005 when successfully finishing writing the internal 
block.

bq. updatePipeline() is called when overlapping failures finally get handled, 
or just before endBlock()? 
See above, it's called "when overlapping failures finally get handled". It's 
will also be called during {{endBlock}}.

> Erasure coding: preallocate multiple generation stamps and serialize updates 
> from data streamers
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9079
>                 URL: https://issues.apache.org/jira/browse/HDFS-9079
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: HDFS-7285
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-9079-HDFS-7285.00.patch
>
>
> A non-striped DataStreamer goes through the following steps in error handling:
> {code}
> 1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4) 
> Applies new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6) 
> Updates block on NN
> {code}
> To simplify the above we can preallocate GS when NN creates a new striped 
> block group ({{FSN#createNewBlock}}). For each new striped block group we can 
> reserve {{NUM_PARITY_BLOCKS}} GS's. Then steps 1~3 in the above sequence can 
> be saved. If more than {{NUM_PARITY_BLOCKS}} errors have happened we 
> shouldn't try to further recover anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to