[jira] [Commented] (HDFS-8254) In StripedDataStreamer, it is hard to tolerate datanode failure in the leading streamer

Zhe Zhang (JIRA) Thu, 28 May 2015 17:28:57 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563982#comment-14563982
 ]


Zhe Zhang commented on HDFS-8254:
---------------------------------

Thanks Nicholas for the patch! Our initial design put {{locateFollowingBlock}} 
logic in lead streamer for simplicity. I think it's a great idea to remove that 
single point of failure.

# The new {{ConcurrentPoll}} class looks good overall. Let me know if this 
understanding is correct: now the fastest streamer will take care of allocating 
block group from NN and distributing to other streamers. Can we add some 
Javadocs for the class and methods, and ideally, a design description on the 
JIRA?
# The concurrent logic is a little complex and some parts could be fragile. For 
example, the {{populate}} method in {{locateFollowingBlock}} directly changes 
the {{block}} of the class. It's true that {{locateFollowingBlock}} is only 
used by {{nextBlockOutputStream}}, which will reassign a correct value for 
{{block}}. But this dependency makes {{locateFollowingBlock}} less 
self-contained. It also looks like we could run into a race condition if 2 
streamers enter {{locateFollowingBlock}} around the same time? They could both 
pass {{isReady2Populate}} before either one has started taking from 
{{endBlocks}}. Since {{DataStreamer#locateFollowingBlock}} is not complex, can 
we do some refactoring and move it to the input stream level? This way the 
{{coordinator}} can take care of the main logic and the fastest streamer just 
has to trigger it. I haven't thought through {{updateBlockForPipeline}} and 
{{updatePipeline}} yet but I guess the stories should be similar.

Nits:
# {{DFSStripedOutputStream}} already has the schema but the streamers are still 
using constants. We should either use the schema or at least add some TODOs.
# New methods in {{DataStreamer}} could use some Javadoc.
# {{class Coordinator}} could be renamed to something like 
{{StreamersCoordinator}}, just to be more specific.
# The Javadoc of {{StripedBlockUtil#checkBlocks}} should say that it checks the 
two blocks are in the same block group

> In StripedDataStreamer, it is hard to tolerate datanode failure in the 
> leading streamer
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-8254
>                 URL: https://issues.apache.org/jira/browse/HDFS-8254
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Tsz Wo Nicholas Sze
>         Attachments: h8254_20150526.patch, h8254_20150526b.patch
>
>
> StripedDataStreamer javadoc is shown below.
> {code}
>  * The StripedDataStreamer class is used by {@link DFSStripedOutputStream}.
>  * There are two kinds of StripedDataStreamer, leading streamer and ordinary
>  * stream. Leading streamer requests a block group from NameNode, unwraps
>  * it to located blocks and transfers each located block to its corresponding
>  * ordinary streamer via a blocking queue.
> {code}
> Leading streamer is the streamer with index 0.  When the datanode of the 
> leading streamer fails, the other steamers cannot continue since no one will 
> request a block group from NameNode anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8254) In StripedDataStreamer, it is hard to tolerate datanode failure in the leading streamer

Reply via email to