I'm neither an HBase user (just yet) or a contributor so my opinion isn't
really worth a whole lot here..

But I see HBase as being more similar to MapReduce than to ZK or Avro as far
as becoming a top-level project.  Theoretically you can plug in alternate
filesystems but in reality, both systems run on HDFS as of now and might run
on other stuff in the future.  I agree that there's sometimes been a lack of
urgency with regard to HDFS patches that affect HBase but not Mapreduce --
but I think HBase leaving the project wouldn't really help, and could hurt
both HBase and HDFS.

In other words, HDFS needs a tenant like HBase to push the use cases that
MapReduce doesn't cover -- if there are problems with communication btw
subprojects or with HDFS committer priorities, we should address those
issues rather than split HBase off and amplify the distance.  With MapReduce
and HBase both stretching the capabilities, HDFS can continue to evolve into
being a (the?) robust, performant, mature distributed filesystem.  If it
only optimizes for one use case, then it's just a niche i/o layer for
mapreduce.

So I guess my opinion is, "stay, and be more annoying" :)  But in a good
way.


On Thu, Mar 18, 2010 at 3:09 PM, Jonathan Gray <jg...@facebook.com> wrote:

> I would like to see HBase support alternative filesystems in the future.
>  There have been talks of other up and coming DFSs that were built more for
> random access that might make sense for some use cases.  I imagine a time
> down the road where there would be a choice of DFS depending on a particular
> use case.
>
> Users coming from the Hadoop world who would be utilizing both and likely
> be more tuned towards analytics would just add HBase atop Hadoop.  Someone
> coming from a relational database who is interested in fast read/write
> random access might be able to choose a DFS more closely suited to that use
> case.  Hopefully HDFS gets better at this so it could be the leader across
> the board, but I don't think we should necessarily be married to it.
>  Besides possible differences in append APIs, in general, it should not be
> difficult to plug a different DFS in (and it's been done in the past with
> kfs).
>
> While it would be nice if active HBase committers were eventually made into
> Hadoop PMC committers, to this point this has not happened (I believe stack
> was already on Hadoop PMC when HBase become a sub-project).  When we want to
> add a new committer we now have to build a case to people who actually have
> no community insight rather than allowing our community (which I believe is
> big enough to support itself) to make their own decisions.
>
> Also, I've not seen Stack's presence on the Hadoop PMC in any way
> contribute to the likelihood of an HDFS patch getting committed.
>
> That being said, we would not want to create any bad blood w/ the Hadoop
> community.  Dhruba, do you think that is a risk?
>
> JG
>
> > -----Original Message-----
> > From: Dhruba Borthakur [mailto:dhr...@gmail.com]
> > Sent: Thursday, March 18, 2010 11:08 AM
> > To: hbase-dev@hadoop.apache.org
> > Subject: Re: [DISCUSS] HBase as Apache top-level project?
> >
> > Hi Stack,
> >
> > Can HBase (in theory) be used on filesystems/MR other than Hadoop?
> >
> > I see one primary disadvantage of moving away from the Hadoop project.
> > Please let me explain. In the Hadoop world, if a committer is actively
> > contributing code, she/he becomes part of the Hadoop PMC. This means
> > that
> > Hbase active hbase committers would (over time) become Hadoop PMC
> > members.
> > This might allow Hbase-related fixes to get into HDFS much more easily.
> > If
> > HBase moves away from Hadoop, then Hbase developers will not have a
> > part to
> > play in guiding HDFS to make it more amenable to HBase usage.
> >
> > The case is different for ZK and avro. They are not related to Hadoop
> > HDFS/MR at all.
> >
> > I am not voting against this proposal, just laying out my viewpoint.
> >
> > thanks,
> > dhruba
> >
> >
> > On Thu, Mar 18, 2010 at 10:43 AM, Stack <st...@duboce.net> wrote:
> >
> > > On Thu, Mar 18, 2010 at 10:15 AM, Andrew Purtell
> > <apurt...@apache.org>
> > > wrote:
> > > >
> > > > HBase is an integrated optional part of a Hadoop stack more
> > > > than a standalone component, but other ASF TLPs build on top
> > > > of other projects. I suppose HDFS and ZK are going to be TLPs
> > > > at some point also, is that true? Leaving Hadoop as just the
> > > > MR framework?
> > >
> > > If the board allows us be a TLP, Zookeeper would probably be made a
> > > TLP at same time.
> > >
> > > There hasn't been a vote, but it seems that the thought is that HDFS
> > > would stay within the hadoop fold; i.e. hdfs+mapreduce+common would
> > > stay.
> > >
> > > >
> > > > Anyway, what I like is HBase will stand on its own merits.
> > > >
> > > > What are the risks of being a TLP?
> > > >
> > >
> > > I'm sure there are some but I'm blinded by the upside at the moment.
> > >
> > > St.Ack
> > >
> >
> >
> >
> > --
> > Connect to me at http://www.facebook.com/dhruba
>

Reply via email to