[ 
https://issues.apache.org/jira/browse/MRUNIT-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227052#comment-13227052
 ] 

Jim Donofrio commented on MRUNIT-69:
------------------------------------

We use an annotation on the class to specify the mapper/reducer classes because 
we can use arrays which is required for PipelineMapReduceDriver which isnt 
possible with declaring the classes in the generic type. Users could also 
easily leave off mapper/reducer classes in the generic type.

We then need a way to read these annotations. We either make all testclasses 
extend DriverTest because we can use this.getClass() to get a reference to the 
actual user class in order to use Reflection to read the annotations at 
runtime. Or we use all static methods with thread locals and use 
Thread.currentThread().getStackTrace() to search for the annotations. We may 
want to use thread locals even if we extend DriverTest because some people may 
want to use the concurrent JUnit runners which can run multiple methods in 
parallel.

Lastly, we use a @Before method in the superclass Driver to cleanup the state 
and the @After method does the runTest call so an example would be:

@MapTest(IdentityMapper.class)
public class IdentityMapperTest {

  @Test
  public void testMap() {
    addConfParam(key, value);
    addInput(key, value);
    addInput(key, value);
    addCounter(key, value);
    addOutput(key, value);
    addMultipleOutput(prefix, key, value);
  }

}

This eliminates the driver, the need to specify the mapper/reducer class in 
every method, the run call. I dont think it could be much simpler than this. We 
could also allow multiple key, value pairs into a given mapper for MRUNIT-64 or 
multiple keys into a reducer with a method that took a varargs value.

We could put all this in a new package structure and deprecate the old api 
which would fix MRUNIT-76
                
> longterm plan for MRUNIT development?
> -------------------------------------
>
>                 Key: MRUNIT-69
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-69
>             Project: MRUnit
>          Issue Type: Brainstorming
>            Reporter: Jim Donofrio
>            Priority: Minor
>
> So I am curious what the plan is for the longterm future of MRUNIT?
> I think currently MRUNIT is useful for just unit testing a single mapper or 
> reducer but currently there is a void for testing more complicated features 
> such as MultipleInputs, MultipleOutputs, a driver class, counters, among 
> other things. I wonder if instead of adding support to the current MRUNIT 
> framework for these extra features it would more useful to add in hooks to 
> the existing LocalJobRunner and MiniMRCluster classes to provide methods to 
> more easily verify file output from text files, sequence files, etc. This 
> would allow MRUNIT to test driver classes, MultipleInputs, MultipleOutputs, 
> etc. MRUNIT would also then test against the real hadoop code instead of an 
> implementation that mimics hadoop which can miss some bugs such as the 
> ReduceDriver that did not reuse the same object until 0.8.0. MRUNIT would 
> also keep up with new map reduce features instead of us having to implement 
> fake versions of them
> I understand that performance would be an issue due to the file I/O but I 
> wonder how fast the LocalJobRunner would be if we wrote a new class that 
> extending FileSystem to allow users to write out fake files to memory and 
> make the LocalJobRunner read from them

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to