[
https://issues.apache.org/jira/browse/HDFS-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964015#comment-14964015
]
Zhe Zhang edited comment on HDFS-9098 at 10/22/18 7:14 PM:
-----------------------------------------------------------
WIP patch to demonstrate the idea. It leverages ideas from the [IMUnit |
http://mir.cs.illinois.edu/imunit/] paper and [sync_point testing |
https://github.com/apache/kudu/blob/67f2486891378c8344ac3f68f9e8f344b74881af/src/kudu/util/sync_point.cc]
in Kudu.
The logic of {{syncPoint}} is still hacky because it needs to serve both as a
synchronization point and a fault injector. I'm working on a better structure.
The added {{TestStripedDataStreamers}} has a very simple test to emulate the
case where a second failure happens during the {{updateBlockForPipeline}} for
the first failure. I think ideally we need to create {{BEFORE}} and {{AFTER}}
events, like {{BEFORE_UPDATE_BLOCK_FOR_PIPELINE}} and
{{AFTER_UPDATE_BLOCK_FOR_PIPELINE}}. The {{WRITE_CHUNK}} event is also a little
tricky. We need to emulate the {{writeChunk}} for a specific offset.
was (Author: zhz):
WIP patch to demonstrate the idea. It leverages ideas from the [IMUnit |
http://mir.cs.illinois.edu/imunit/] paper and [sync_point testing |
https://github.com/cloudera/kudu/blob/master/src/kudu/util/sync_point.cc] in
Kudu.
The logic of {{syncPoint}} is still hacky because it needs to serve both as a
synchronization point and a fault injector. I'm working on a better structure.
The added {{TestStripedDataStreamers}} has a very simple test to emulate the
case where a second failure happens during the {{updateBlockForPipeline}} for
the first failure. I think ideally we need to create {{BEFORE}} and {{AFTER}}
events, like {{BEFORE_UPDATE_BLOCK_FOR_PIPELINE}} and
{{AFTER_UPDATE_BLOCK_FOR_PIPELINE}}. The {{WRITE_CHUNK}} event is also a little
tricky. We need to emulate the {{writeChunk}} for a specific offset.
> Erasure coding: emulate race conditions among striped streamers in write
> pipeline
> ---------------------------------------------------------------------------------
>
> Key: HDFS-9098
> URL: https://issues.apache.org/jira/browse/HDFS-9098
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: erasure-coding
> Affects Versions: 3.0.0-alpha1
> Reporter: Zhe Zhang
> Assignee: Zhe Zhang
> Priority: Major
> Attachments: HDFS-9098.00.patch, HDFS-9098.wip.patch
>
>
> Apparently the interleaving of events among {{StripedDataStreamer}}'s is very
> tricky to handle. [~walter.k.su] and [~jingzhao] have discussed several race
> conditions under HDFS-9040.
> Let's use FaultInjector to emulate different combinations of interleaved
> events.
> In particular, we should consider inject delays in the following places:
> # {{Streamer#endBlock}}
> # {{Streamer#locateFollowingBlock}}
> # {{Streamer#updateBlockForPipeline}}
> # {{Streamer#updatePipeline}}
> # {{OutputStream#writeChunk}}
> # {{OutputStream#close}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]