I'm thinking, could MRUnit be the place to put in other hadoop-testing code.
specifically
== Junit on multiple hosts ==
I have some prototype code to exec junit test cases as MR jobs, collect
the results (including serialized throwables). It runs one test per line
of text (the name of the package). It could be better to support lines
of tests and config options, or other ways to explore the config space.
And I'd really like to be able to deploy the junit tests to all the
workers in the cluster, the reduction would be to identify which boxes
are playing up.
== Sampling for testing ==
Good desktop tests need real data, which means sampling from the live
datasets. Some standard MR jobs to do the sampling (which themselves use
MR Unit to self-test) could make it easier to sample.
thoughts?