On Aug 15, 2008, at 10:03 PM, Nigel Daley wrote:

Another benefit is that it would increase the separation of these technologies, so that, e.g., folks could more easily run different versions of mapreduce on top of different versions of HDFS. Currently we make no such guarantees. Folks would be able to upgrade to, e.g., the next release of mapreduce on a subset of their cluster without upgrading their HDFS. That's not currently supported. As we move towards splitting mapreduce into a scheduler and runtime, where folks can specify a different runtime per job, this will be even more critical.

Sounds like we simply need to create separate jar files for these different components. This can be done in the current project.

Wouldn't the amount of effort to make this split and get it right be better spent on getting all components of Hadoop to 1.0 (API stability)? The proposal feels like a distraction to me at this point in the project.

Nige

I'd like to retract the -1 vote that I gave this proposal earlier. One compelling reason (for me) to split HDFS and Map/Reduce into separate sub-projects is that (hopefully) the *configs* for each layer will be clearer and simpler.

So I'm now +1 on this proposal.

Nige

Reply via email to