On Dec 7, 2010, at 2:23 AM, Jeff Hammerbacher wrote:
To the best of my knowledge, Owen, your organization requires users to
petition a committee before writing MapReduce jobs.
I'd appreciate if we could keep the discussion technical and did not
resort to snide comments. Thank you.
Users of Hadoop have moved beyond MapReduce. The community would be
far
better served by a compact, reliable, and efficient kernel. That's the
project direction Doug has suggested for MapReduce, and it's one
that Eric
and Tom have supported. I also support this direction for the project.
This is a great discussion to have, if Doug could start it, rather
than put forward his word as the law.
However, this is not germane to the discussion at hand.
The discussion at hand is simple: Doug has vetoed this patch for 2
reasons:
a) dependency on PB
b) extension to SequenceFile
a) is technical, b) isn't. This discussion is about b).
I'd be ecstatic to see this discussion result in moving the file
formats,
input and output formats, and other library code out to a separate
Apache
project or Github where they can evolve rapidly based on user needs,
so that
the MapReduce project can begin to address some of the outstanding
issues
with the framework itself.
Again, no one is proposing new file formats here. SequenceFile is an
important file format for several reasons:
- It's been bundled with Hadoop for nearly 5 years now
- Several users store petabytes of data on it
Blocking extensions to SequenceFile is unreasonable as has been noted
by several folks, there is no *technical* reason to do that.
People are welcome to start any number of file-formats and input/
output libraries either in Apache or outside, no one is proposing
otherwise.
Arun