Nigel Daley wrote:
How will unit tests be divided? For instance, will all three have to have MiniDFSCluster and other shared test infrastructure?
The HDFS project can release an hdfs-test.jar file that contains MiniDFSCluster. This will be used by mapred tests. Similarly, mapred will release a mapred-test.jar that contains MiniMRCluster, which can be used by hdfs tests. There is a circular dependency, but only in the test code, not in the mapred or hdfs code itself. This is easy to enforce, since test code is not on the classpath when we compile non-test code.
-1 until I better understand the benefit of making the split.
One benefit is that developers would spend less time reading messages about areas they're not interested in. The core-dev mailing list traffic is becoming unmanageable. Splitting these without splitting the project would mean that a split developer community would attempt to build a coherent product, which sounds dangerous.
Another benefit is that it would increase the separation of these technologies, so that, e.g., folks could more easily run different versions of mapreduce on top of different versions of HDFS. Currently we make no such guarantees. Folks would be able to upgrade to, e.g., the next release of mapreduce on a subset of their cluster without upgrading their HDFS. That's not currently supported. As we move towards splitting mapreduce into a scheduler and runtime, where folks can specify a different runtime per job, this will be even more critical.
We need to make this split eventually. Why not now? Doug
