[ 
https://issues.apache.org/jira/browse/HDFS-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985184#action_12985184
 ] 

Konstantin Boudnik commented on HDFS-1566:
------------------------------------------

Right, manual reproduction isn't an issue I believe (thanks for the tip of 
having logs along with the data - that should help for sure). 

After looking into this problem somewhat more it became apparent that FI is 
unlikely to help. For once, IOException for an out-of-space disk isn't specific 
enough and it'd be hard to fine-tune the injection. I am not saying it isn't 
possible, but it seems to be a way harder that trying to reproduce the issue at 
the system test level (e.g. with loop-device or in-memory partition, etc.)

> Test that covers full partition  
> ---------------------------------
>
>                 Key: HDFS-1566
>                 URL: https://issues.apache.org/jira/browse/HDFS-1566
>             Project: Hadoop HDFS
>          Issue Type: Test
>          Components: name-node
>    Affects Versions: 0.20.2
>            Reporter: Eli Collins
>            Assignee: Konstantin Boudnik
>             Fix For: 0.23.0
>
>
> We've seen the following bug, hdfs needs a test to reproduce this:
> * /var filled up
> * 2NN failed checkpoint due to no space left on device
> * NN log hit end of disk
> * NN seems to have exited on the spot, mid-log-message
> * NN edits are left corrupted
> ** Half of a rename made it into the log
> ** valid data appears to end on a sector boundary
> ** this is true across all of the edit dirs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to