[
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728246#comment-14728246
]
Zhe Zhang commented on HDFS-8383:
---------------------------------
Thanks Walter for creating the patch. Below is a list of comments, some on the
overall write fault tolerance design, and others on this patch.
# {{DataStreamer#ErrorState#externalError}} looks a key concept. [~szetszwo]:
Does it mean "error from peer streamers"? We should take this chance to add a
Javadoc.
# Right now when a DN (e.g. DN_0) fails, we handle other streams (DN_1~DN_5) as
if each of them has a failed DN. We trigger {{processDatanodeError}} to close
the stream and open again with the same DN. This overhead isn't really
necessary. IIUC all we want to do is to bump the {{GenerationStamp}} for
internal blocks 1~5. Can we do it by sending a packet (or piggybacking with a
data packet) to DN?
# By doing the above we can also simplify the error handling logic. All we need
is an {{AtomicInteger groupGS}} in {{DFSStripedOutputStream}} recording the
current GS. Each failed streamer should increment {{groupGS}}. Each streamer
can compare {{groupGS}} with its current GS before sending the next packet.
# Regardless of this change, the write error handling logic is already very
complex IMO. Maybe we can consider moving {{locateFollowingBlock}} to
OutputStream level so the streamer's task is capped within a single block. For
non-EC files this refactor will also facilitate HDFS-8955.
Nits on the patch
# Is {{BlockRecoveryTrigger}} a singleton? If so do we need the synchronization?
# {{private Integer numScheduled}} looks like it's a boolean?
> Tolerate multiple failures in DFSStripedOutputStream
> ----------------------------------------------------
>
> Key: HDFS-8383
> URL: https://issues.apache.org/jira/browse/HDFS-8383
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Tsz Wo Nicholas Sze
> Assignee: Walter Su
> Attachments: HDFS-8383.00.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)