[ 
https://issues.apache.org/jira/browse/HDFS-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010399#comment-15010399
 ] 

Zhe Zhang commented on HDFS-9079:
---------------------------------

Thanks for taking a look Uma. 

Good point on HA. One quick solution is for a standby to always assume the 
largest seen GS was part of a preallocation, and avoid using the next 
{{MAX_PREALLOCATION}} GSes. We can define {{MAX_PREALLOCATION}} to be the 
largest possible preallocation size. In the context of EC blocks, we are 
preallocating _k_ GSes, where _k_ is the number of parity blocks in the schema. 
So 8 should be sufficient.

bq. Also other cases to consider is, in HA we postpone the blocks from DN in 
standbyNN based on generation stamp. if DN reported the block with a genstamp 
before NNs really synced the edits about that block, standby NN will postpone 
the blocks if the reported block has lesser genstamp than it knows. I am 
wondering any analysis done in this perspective also?
The proposed protocol in this JIRA guarantees that a healthy DN (no error 
happens during the write) always have the same or higher GS than the NN copy. 
If the SbNN has not synced all edits for the block, the SbNN's copy of the 
block's GS should be even smaller, which makes it smaller than the DN's copy, 
right?

> Erasure coding: preallocate multiple generation stamps and serialize updates 
> from data streamers
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9079
>                 URL: https://issues.apache.org/jira/browse/HDFS-9079
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: HDFS-7285
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-9079-HDFS-7285.00.patch, HDFS-9079.01.patch, 
> HDFS-9079.02.patch, HDFS-9079.03.patch, HDFS-9079.04.patch, 
> HDFS-9079.05.patch, HDFS-9079.06.patch, HDFS-9079.07.patch, 
> HDFS-9079.08.patch, HDFS-9079.09.patch, HDFS-9079.10.patch, HDFS-9079.11.patch
>
>
> A non-striped DataStreamer goes through the following steps in error handling:
> {code}
> 1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4) 
> Applies new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6) 
> Updates block on NN
> {code}
> With multiple streamer threads run in parallel, we need to correctly handle a 
> large number of possible combinations of interleaved thread events. For 
> example, {{streamer_B}} starts step 2 in between events {{streamer_A.2}} and 
> {{streamer_A.3}}.
> HDFS-9040 moves steps 1, 2, 3, 6 from streamer to {{DFSStripedOutputStream}}. 
> This JIRA proposes some further optimizations based on HDFS-9040:
> # We can preallocate GS when NN creates a new striped block group 
> ({{FSN#createNewBlock}}). For each new striped block group we can reserve 
> {{NUM_PARITY_BLOCKS}} GS's. If more than {{NUM_PARITY_BLOCKS}} errors have 
> happened we shouldn't try to further recover anyway.
> # We can use a dedicated event processor to offload the error handling logic 
> from {{DFSStripedOutputStream}}, which is not a long running daemon.
> # We can limit the lifespan of a streamer to be a single block. A streamer 
> ends either after finishing the current block or when encountering a DN 
> failure.
> With the proposed change, a {{StripedDataStreamer}}'s flow becomes:
> {code}
> 1) Finds DN error => 2) Notify coordinator (async, not waiting for response) 
> => terminates
> 1) Finds external error => 2) Applies new GS to DN (createBlockOutputStream) 
> => 3) Ack from DN => 4) Notify coordinator (async, not waiting for response)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to