On 06/08/2011 05:13 PM, Andrew Purtell wrote:
I can tell you feel I'm picking on HBase, especially in light of my
flat out rejection of the "we want to mmap() blocks" case.

I for one understand the objection there.

Although it does negatively impact the work of a recent promising new 
contributor. As a project, HBase suffers for it. Of course that is no concern 
of HDFS.

On the other hand I do believe Todd has a point. MapReduce is perhaps the only 
constituency that HDFS really cares about. Any reasonable person would come to 
that conclusion after surveying submitted JIRAs and their resolution times (or 
not). Historically with HDFS the local itch, the concern of the big MapReduce 
shops, gets the scratch and others are of not much concern. Therefore there is 
unfortunate business that lingers today -- Facebook, StumbleUpon, Trend Micro, 
and others have effectively forked HDFS (0.20) in house for use with HBase, and 
nobody I know is seriously considering using HDFS 0.22 or TRUNK due to a lack 
of evidence that anyone with a stake in it is running it in production at 
scale. Past discussion to mend the breach with an HBase-friendly release of 
HDFS 0.20 ended with what I would describe as an inflexible and legalistic air.


well, today MR is the primary constituency, but to be a stack you do have to make the otyher layers work. MR, with Hive and Pig on top, HBase, mahout.

These extra layers can form part of the regression tests for the underlying code: if a change breaks HBase or Hive, that's something to catch early, and say "this change to hadoop-common broke it".

yes, it's extra hassle dealing with changes that break things, but you find the problems so end users don't have to. And Jenkins can be set up to do much of the work, you just tweak the dependencies of the downstream projects to use the svn.trunk or -SNAPSHOT version of your code, run the builds in the right order to generate the artifacts, and wait for the emails to come in.

-steve

Reply via email to