[ 
https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Bo updated HDFS-8704:
------------------------
    Description: 
I test current code on a 5-node cluster using RS(3,2).  When a datanode is 
corrupt, client succeeds to write a file smaller than a block group but fails 
to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests files 
smaller than a block group, this jira will add more test situations.


A streamer may encounter some bad datanodes when writing blocks allocated to 
it. When it fails to connect datanode or send a packet, the streamer needs to 
prepare for the next block. First it removes the packets of current  block from 
its data queue. If the first packet of next block has already been in the data 
queue, the streamer will reset its state and start to wait for the next block 
allocated for it; otherwise it will just wait for the first packet of next 
block. The streamer will check periodically if it is asked to terminate during 
its waiting.


  was:I test current code on a 5-node cluster using RS(3,2).  When a datanode 
is corrupt, client succeeds to write a file smaller than a block group but 
fails to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests 
files smaller than a block group, this jira will add more test situations.


> Erasure Coding: client fails to write large file when one datanode fails
> ------------------------------------------------------------------------
>
>                 Key: HDFS-8704
>                 URL: https://issues.apache.org/jira/browse/HDFS-8704
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Li Bo
>         Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, 
> HDFS-8704-HDFS-7285-003.patch, HDFS-8704-HDFS-7285-004.patch, 
> HDFS-8704-HDFS-7285-005.patch
>
>
> I test current code on a 5-node cluster using RS(3,2).  When a datanode is 
> corrupt, client succeeds to write a file smaller than a block group but fails 
> to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests 
> files smaller than a block group, this jira will add more test situations.
> A streamer may encounter some bad datanodes when writing blocks allocated to 
> it. When it fails to connect datanode or send a packet, the streamer needs to 
> prepare for the next block. First it removes the packets of current  block 
> from its data queue. If the first packet of next block has already been in 
> the data queue, the streamer will reset its state and start to wait for the 
> next block allocated for it; otherwise it will just wait for the first packet 
> of next block. The streamer will check periodically if it is asked to 
> terminate during its waiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to