On Aug 15, 2008, at 10:03 PM, Nigel Daley wrote:
Another benefit is that it would increase the separation of these
technologies, so that, e.g., folks could more easily run different
versions of mapreduce on top of different versions of HDFS.
Currently we make no such guarantees. Folks would be able to
upgrade to, e.g., the next release of mapreduce on a subset of
their cluster without upgrading their HDFS. That's not currently
supported. As we move towards splitting mapreduce into a scheduler
and runtime, where folks can specify a different runtime per job,
this will be even more critical.
Sounds like we simply need to create separate jar files for these
different components. This can be done in the current project.
Wouldn't the amount of effort to make this split and get it right be
better spent on getting all components of Hadoop to 1.0 (API
stability)? The proposal feels like a distraction to me at this
point in the project.
Nige
I'd like to retract the -1 vote that I gave this proposal earlier.
One compelling reason (for me) to split HDFS and Map/Reduce into
separate sub-projects is that (hopefully) the *configs* for each layer
will be clearer and simpler.
So I'm now +1 on this proposal.
Nige