On Dec 7, 2010, at 2:23 AM, Jeff Hammerbacher wrote:

To the best of my knowledge, Owen, your organization requires users to
petition a committee before writing MapReduce jobs.

I'd appreciate if we could keep the discussion technical and did not resort to snide comments. Thank you.

Users of Hadoop have moved beyond MapReduce. The community would be far
better served by a compact, reliable, and efficient kernel. That's the
project direction Doug has suggested for MapReduce, and it's one that Eric
and Tom have supported. I also support this direction for the project.


This is a great discussion to have, if Doug could start it, rather than put forward his word as the law.

However, this is not germane to the discussion at hand.

The discussion at hand is simple: Doug has vetoed this patch for 2 reasons:
a) dependency on PB
b) extension to SequenceFile

a) is technical, b) isn't. This discussion is about b).

I'd be ecstatic to see this discussion result in moving the file formats, input and output formats, and other library code out to a separate Apache project or Github where they can evolve rapidly based on user needs, so that the MapReduce project can begin to address some of the outstanding issues
with the framework itself.

Again, no one is proposing new file formats here. SequenceFile is an important file format for several reasons:


- It's been bundled with Hadoop for nearly 5 years now
- Several users store petabytes of data on it

Blocking extensions to SequenceFile is unreasonable as has been noted by several folks, there is no *technical* reason to do that.

People are welcome to start any number of file-formats and input/ output libraries either in Apache or outside, no one is proposing otherwise.

Arun



Reply via email to