[
https://issues.apache.org/jira/browse/HDFS-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009951#comment-15009951
]
Uma Maheswara Rao G commented on HDFS-9079:
-------------------------------------------
[~zhz], Thanks for the great efforts and Nice proposals.
{code}
long gs = blockIdManager.nextGenerationStamp(legacyBlock);
+ for (int i = 0; i < preAllocate; i++) {
+ blockIdManager.nextGenerationStamp(false);
+ }
+
if (legacyBlock) {
getEditLog().logGenerationStampV1(gs);
} else {
{code}
I have a general question here: When HA switch happens how other
NN(standby->active) know about this incremented genstamp?
But client and current primary NN know that it can use upto preallocated
genstamps. But we are persisting the current using genstamp. So when NN switch
happens, other NN know only current using genstamp. But what if client tried to
use currentGenstamp +1 (proposedGenstamp) due to some failures? So, when other
client asked to get genstamp, the switched NN may getstamp from currentGenstamp
+1 right?
Also other cases to consider is, in HA we postpone the blocks from DN in
standbyNN based on generation stamp. if DN reported the block with a genstamp
before NNs really synced the edits about that block, standby NN will postpone
the blocks if the reported block has lesser genstamp than it knows. I am
wondering any analysis done in this perspective also?
> Erasure coding: preallocate multiple generation stamps and serialize updates
> from data streamers
> ------------------------------------------------------------------------------------------------
>
> Key: HDFS-9079
> URL: https://issues.apache.org/jira/browse/HDFS-9079
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: erasure-coding
> Affects Versions: HDFS-7285
> Reporter: Zhe Zhang
> Assignee: Zhe Zhang
> Attachments: HDFS-9079-HDFS-7285.00.patch, HDFS-9079.01.patch,
> HDFS-9079.02.patch, HDFS-9079.03.patch, HDFS-9079.04.patch,
> HDFS-9079.05.patch, HDFS-9079.06.patch, HDFS-9079.07.patch,
> HDFS-9079.08.patch, HDFS-9079.09.patch, HDFS-9079.10.patch, HDFS-9079.11.patch
>
>
> A non-striped DataStreamer goes through the following steps in error handling:
> {code}
> 1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4)
> Applies new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6)
> Updates block on NN
> {code}
> With multiple streamer threads run in parallel, we need to correctly handle a
> large number of possible combinations of interleaved thread events. For
> example, {{streamer_B}} starts step 2 in between events {{streamer_A.2}} and
> {{streamer_A.3}}.
> HDFS-9040 moves steps 1, 2, 3, 6 from streamer to {{DFSStripedOutputStream}}.
> This JIRA proposes some further optimizations based on HDFS-9040:
> # We can preallocate GS when NN creates a new striped block group
> ({{FSN#createNewBlock}}). For each new striped block group we can reserve
> {{NUM_PARITY_BLOCKS}} GS's. If more than {{NUM_PARITY_BLOCKS}} errors have
> happened we shouldn't try to further recover anyway.
> # We can use a dedicated event processor to offload the error handling logic
> from {{DFSStripedOutputStream}}, which is not a long running daemon.
> # We can limit the lifespan of a streamer to be a single block. A streamer
> ends either after finishing the current block or when encountering a DN
> failure.
> With the proposed change, a {{StripedDataStreamer}}'s flow becomes:
> {code}
> 1) Finds DN error => 2) Notify coordinator (async, not waiting for response)
> => terminates
> 1) Finds external error => 2) Applies new GS to DN (createBlockOutputStream)
> => 3) Ack from DN => 4) Notify coordinator (async, not waiting for response)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)