[
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728959#comment-14728959
]
Walter Su commented on HDFS-8383:
---------------------------------
There is debate long ago about parallel write and pipeline write. Parallel
looks like not quite compelling. If HDFS supports parallel, implementing
DFSStripedOutputStream would be quite easy.
DFSStripedOutputStream/StripedDataStreamer is very like parallel write. If you
change DFSStripedOutputStream.writeChunk(..) you can do parallel write for
non-EC files easily. We have done the heavy lifting(synchronization), but don't
want to change many existing code of the pipeline mechanism.
bq. Right now when a DN (e.g. DN_0) fails, we handle other streams (DN_1~DN_5)
as if each of them has a failed DN. We trigger processDatanodeError to close
the stream and open again with the same DN. This overhead isn't really
necessary. IIUC all we want to do is to bump the GenerationStamp for internal
blocks 1~5. Can we do it by sending a packet (or piggybacking with a data
packet) to DN?
I think it's incompatible, and changes the protocol of the pipeline mechanism.
Nothing I can do for single failure. I do suggest interrupt the on-going
recovery for multiple failures to reduce the number of stream open/close. I
have added a TODO.
bq. By doing the above we can also simplify the error handling logic. All we
need is an AtomicInteger groupGS in DFSStripedOutputStream recording the
current GS. Each failed streamer should increment groupGS. Each streamer can
compare groupGS with its current GS before sending the next packet.
Without #2 improvement, this is just about passive vs active.
bq. Regardless of this change, the write error handling logic is already very
complex IMO. Maybe we can consider moving locateFollowingBlock to OutputStream
level so the streamer's task is capped within a single block. For non-EC files
this refactor will also facilitate HDFS-8955.
OutputStream and streamer have different roles to play. I think
{{locateFollowingBlock}} belong to streamer. Actually it should belong to a
single {{BlockGroupDataStreamer}} to communicate with NN to allocate/update
block, and {{StripedDataStreamer}} only has to stream block to DN. But I think
it's ok don't seperate them, just let fastest streamer take the job.
> Tolerate multiple failures in DFSStripedOutputStream
> ----------------------------------------------------
>
> Key: HDFS-8383
> URL: https://issues.apache.org/jira/browse/HDFS-8383
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Tsz Wo Nicholas Sze
> Assignee: Walter Su
> Attachments: HDFS-8383.00.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)