[ 
https://issues.apache.org/jira/browse/HDFS-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061443#comment-13061443
 ] 

Matt Foley commented on HDFS-2136:
----------------------------------

I'm not at all expert on this stuff, but here are my thoughts.  If it were 
sufficient to do the usual mock/spy thing, after creation of the relevant 
object stack, I would want to do #3.  Simplest implementation would be pairs of 
protected methods that would fetch a critical object (perhaps the SD itself) 
and set (replace it with) a mocked or spy'ed version of it.

However, many of the most interesting cases to test are during startup, when 
the interesting objects are still being created on the fly.  It may be that 
AspectJ is the right way to handle FI at this point, but I haven't used 
AspectJ.  Another way to do it would be passing a callback class through the 
conf (yuck! - but it would work).  Such a callback, if non-null, could be 
called at various key points in the read and write methods, and achieve "in 
vivo" FI.  I do suspect AspectJ would do this well, so I'm doing some reading.  
What do you think? Does this fit within your understanding of what the AOP FI 
framework can do?

By #2, do you mean "ex vivo" calls that would run a fragment of code, out of 
context, but with FI?  That would certainly be better than nothing, but would 
not give me as much confidence as #1 or #3 that the system would correctly 
handle a fault during startup.

> 1073: Fault injection for StorageDirectory failures during read/write of 
> FSImage/Edits files
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-2136
>                 URL: https://issues.apache.org/jira/browse/HDFS-2136
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Matt Foley
>
> Both HDFS-1955 and HDFS-2135 have observed that it is difficult to unit test 
> such failures.  As a result, regression of HDFS-1955 was only found by 
> careful manual review (thanks, atm!).  Since 1073 is making broad changes to 
> the way these files are read and written, and appropriately putting effort 
> into correct error handling, I propose we make also make it possible to 
> auto-test that error handling.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to