[ 
https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712581#comment-14712581
 ] 

Li Bo commented on HDFS-8704:
-----------------------------

Thanks Zhe’s review. The handle of multiple block groups for datanode failure 
is complex, let me give some detailed explanation.

Suppose packets {{p1,p2,p3,p4}} belong to the first block and {{p5,p6,p7,p8}} 
belong to the next one. Now streamer fails to send packet p2 and set its 
{{streamerClosed}} as true. Its {{dataQueue}} still contains {{p3,p4,p5}}.  
Then {{DFSStripedOutputStream}} will fail to write data of packet p6, and set 
this streamer as failed. After that {{DFSStripedOutputStream}} will not write 
{{p7,p8}} to that streamer any more. Maybe the streamer is allocated a good 
datanode and {{p5,p6,p7,p8}} should be successfully written to that datanode. 
We only see {{StripedDataStreamer#setFailed(true)}}, where and when to mark the 
streamer as not failed?
Due to the asynchronization of {{StripedDataStreamer}} and 
{{DFSStripedOutputStream}}, it’s more reasonable to let 
{{DFSStripedOutputStream}} be unware of the status of streamers unless there’re 
no enough streamers. When a streamer fails to connect datanode or write some 
packet, it removes the following trivial packets of current block in its 
{{dataQueue}} and waiting for next block to be allocated to it. 

1.      Please see above explanation
2.      If just writing a file smaller than a block group, you’ll not find the 
problem of {{setFailed}}. The failed status should be marked and erased by data 
streamer itself, not the outputstream.
3.      When streamer fails, it should do some extra work to prepare for the 
next block. It’s very difficult to achieve this if not overriding {{run}}. I 
will give some description in JIRA summary later. When you fail to send a 
packet, the following packets belonging to the same block are trivial packets, 
you just need to remove them from the {{dataQueue}}.
4.      I will update the patch after HDFS-8838 committed.
5.      For small issues: If streamer fails at some block, it still sends to 
coordinator an end block with a negative {{numBytes}}.


> Erasure Coding: client fails to write large file when one datanode fails
> ------------------------------------------------------------------------
>
>                 Key: HDFS-8704
>                 URL: https://issues.apache.org/jira/browse/HDFS-8704
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Li Bo
>         Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, 
> HDFS-8704-HDFS-7285-003.patch, HDFS-8704-HDFS-7285-004.patch, 
> HDFS-8704-HDFS-7285-005.patch
>
>
> I test current code on a 5-node cluster using RS(3,2).  When a datanode is 
> corrupt, client succeeds to write a file smaller than a block group but fails 
> to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests 
> files smaller than a block group, this jira will add more test situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to