Owen O'Malley wrote:
On Wed, Jul 1, 2009 at 6:45 PM, Todd Lipcon<tlip...@gmail.com> wrote:
Agree with Phillip here. Requiring a new jar to be checked in anywhere after
every common commit seems unscalable and nonperformant. For git users this
will make the repository size baloon like crazy (the jar is 400KB and we
have around 5300 commits so far = 2GB!).

This is silly. Obviously, just like the source the jars compress
across versions very well.

I think it would be reasonable to require that developers check out a
structure like:

working-dir/
 hadoop-common/
 hadoop-mapred/
 hadoop-hdfs/

-1 They are separate subprojects. In the medium term, mapreduce and
hdfs should compile and run against the released version common.
Checking in the jars is a temporary step while the interfaces in
common stabilize. Furthermore, I expect the volume in common should be
much lower than in mapreduce or hdfs.


There are various use cases here

-people working in hdfs who don't need mapred (though they should for regression testing their work) but do need a stable common
-people working in mapred who need a working common/hdfs
-someone trying to work across all three (or in common, which is effectively that from a regression testing viewpoint) -someone who just wants all the code for debugging/using mapreduce or other bits of hadoo

For anyone who is playing in at the source level where they are getting changing libraries, having the separate projects in subdirs with common targets is invaluable; ivy can do the glue. But at the same time, should you require everyone working on mapred to pull down and build common and hdfs?

Reply via email to