One other thought is that in the not-so-distant future HBase may take on 
subprojects of its own.

> -----Original Message-----
> From: Jonathan Gray [mailto:jg...@facebook.com]
> Sent: Thursday, March 18, 2010 5:00 PM
> To: hbase-dev@hadoop.apache.org
> Subject: RE: [DISCUSS] HBase as Apache top-level project?
> 
> Isn't the hard spot where we've always been?  :)
> 
> Annoyance has really not gotten us anywhere.  And I don't think it
> matters to those in Hadoop whether we are a TLP or SP, they will not
> (or should not) be offended if we break off.  Do you think they would
> take us (or our patches) less seriously if we were a TLP?
> 
> What has pushed things forward is continuing to make HBase better so
> that more people want to use it.  A larger community and involvement
> from larger companies will help push Hadoop changes aimed at HBase,
> especially when those companies are Hadoop contributors.
> 
> 
> I think being a TLP is good because it gives us autonomy, more
> visibility, and some kind of external validation from Apache that HBase
> has risen to that level (which I believe it has).  I see the risks as
> not too serious.
> 
> If we do think we can get some HBase committers onto the Hadoop PMC,
> and we think that this will make a material difference in outcomes for
> us, then my opinion may change.  Today I don't really think the issue
> is whether we are on the Hadoop PMC or not... my understanding is that
> big decisions are not voted on for a majority, if someone votes against
> it then it is tabled.
> 
> JG
> 
> > -----Original Message-----
> > From: saint....@gmail.com [mailto:saint....@gmail.com] On Behalf Of
> > Stack
> > Sent: Thursday, March 18, 2010 4:09 PM
> > To: hbase-dev@hadoop.apache.org
> > Subject: Re: [DISCUSS] HBase as Apache top-level project?
> >
> > On Thu, Mar 18, 2010 at 1:07 PM, Jonathan Gray <jg...@facebook.com>
> > wrote:
> > > Will HDFS patches aimed at helping the HBase use case (which is not
> > strictly limited to HBase but rather our pattern that differs from
> MR)
> > be any less likely to get pushed through if we become a TLP rather
> than
> > sub-project?  In reality I don't think the distinction makes a
> > practical difference in that sense.
> > >
> >
> > If there are hbase-friendly committers up in hadoop they can marshall
> > through hbase-friendly patches.  Then whether we're under hadoop or
> > TLP matters less (though I do think Jay Booth has a good point when
> he
> > suggests that the best way to make the case for the hbase hdfs access
> > pattern is to '"stay, and be more annoying...")
> >
> > Currently we have only one hbase committer who is also a committer in
> > hadoop and the path to more than this is involved if we move out from
> > under hadoop, Dhruba's point (Its just been confirmed that an hbase
> > committer of a year or so vintage qualifies as a nominee to hadoop
> > pmc).
> >
> >
> > > The things that will really help push the HDFS+HBase relationship
> are
> > things like committers of HDFS being users or contributors of HBase.
> >  Recent interest from Facebook and Cloudera, who each have multiple
> > committers to Hadoop, has really pushed things along nicely in recent
> > weeks.
> > >
> >
> > This is true.  Its for sure made more difference than that one
> > hbase-friendly committer has done during his tenure as an hadoop
> > committer.
> >
> > The downside though is that there is nothing to stop the above
> > companies changing their minds and then a TLP hbase would be in an
> > hard spot.
> >
> > St.Ack
> >
> >
> > > JG
> > >
> > >> -----Original Message-----
> > >> From: Jay Booth [mailto:jaybo...@gmail.com]
> > >> Sent: Thursday, March 18, 2010 12:45 PM
> > >> To: hbase-dev@hadoop.apache.org
> > >> Subject: Re: [DISCUSS] HBase as Apache top-level project?
> > >>
> > >> I'm neither an HBase user (just yet) or a contributor so my
> opinion
> > >> isn't
> > >> really worth a whole lot here..
> > >>
> > >> But I see HBase as being more similar to MapReduce than to ZK or
> > Avro
> > >> as far
> > >> as becoming a top-level project.  Theoretically you can plug in
> > >> alternate
> > >> filesystems but in reality, both systems run on HDFS as of now and
> > >> might run
> > >> on other stuff in the future.  I agree that there's sometimes been
> a
> > >> lack of
> > >> urgency with regard to HDFS patches that affect HBase but not
> > Mapreduce
> > >> --
> > >> but I think HBase leaving the project wouldn't really help, and
> > could
> > >> hurt
> > >> both HBase and HDFS.
> > >>
> > >> In other words, HDFS needs a tenant like HBase to push the use
> cases
> > >> that
> > >> MapReduce doesn't cover -- if there are problems with
> communication
> > btw
> > >> subprojects or with HDFS committer priorities, we should address
> > those
> > >> issues rather than split HBase off and amplify the distance.  With
> > >> MapReduce
> > >> and HBase both stretching the capabilities, HDFS can continue to
> > evolve
> > >> into
> > >> being a (the?) robust, performant, mature distributed filesystem.
> >  If
> > >> it
> > >> only optimizes for one use case, then it's just a niche i/o layer
> > for
> > >> mapreduce.
> > >>
> > >> So I guess my opinion is, "stay, and be more annoying" :)  But in
> a
> > >> good
> > >> way.
> > >>
> > >>
> > >> On Thu, Mar 18, 2010 at 3:09 PM, Jonathan Gray
> <jg...@facebook.com>
> > >> wrote:
> > >>
> > >> > I would like to see HBase support alternative filesystems in the
> > >> future.
> > >> >  There have been talks of other up and coming DFSs that were
> built
> > >> more for
> > >> > random access that might make sense for some use cases.  I
> imagine
> > a
> > >> time
> > >> > down the road where there would be a choice of DFS depending on
> a
> > >> particular
> > >> > use case.
> > >> >
> > >> > Users coming from the Hadoop world who would be utilizing both
> and
> > >> likely
> > >> > be more tuned towards analytics would just add HBase atop
> Hadoop.
> > >> Someone
> > >> > coming from a relational database who is interested in fast
> > >> read/write
> > >> > random access might be able to choose a DFS more closely suited
> to
> > >> that use
> > >> > case.  Hopefully HDFS gets better at this so it could be the
> > leader
> > >> across
> > >> > the board, but I don't think we should necessarily be married to
> > it.
> > >> >  Besides possible differences in append APIs, in general, it
> > should
> > >> not be
> > >> > difficult to plug a different DFS in (and it's been done in the
> > past
> > >> with
> > >> > kfs).
> > >> >
> > >> > While it would be nice if active HBase committers were
> eventually
> > >> made into
> > >> > Hadoop PMC committers, to this point this has not happened (I
> > believe
> > >> stack
> > >> > was already on Hadoop PMC when HBase become a sub-project).
>  When
> > we
> > >> want to
> > >> > add a new committer we now have to build a case to people who
> > >> actually have
> > >> > no community insight rather than allowing our community (which I
> > >> believe is
> > >> > big enough to support itself) to make their own decisions.
> > >> >
> > >> > Also, I've not seen Stack's presence on the Hadoop PMC in any
> way
> > >> > contribute to the likelihood of an HDFS patch getting committed.
> > >> >
> > >> > That being said, we would not want to create any bad blood w/
> the
> > >> Hadoop
> > >> > community.  Dhruba, do you think that is a risk?
> > >> >
> > >> > JG
> > >> >
> > >> > > -----Original Message-----
> > >> > > From: Dhruba Borthakur [mailto:dhr...@gmail.com]
> > >> > > Sent: Thursday, March 18, 2010 11:08 AM
> > >> > > To: hbase-dev@hadoop.apache.org
> > >> > > Subject: Re: [DISCUSS] HBase as Apache top-level project?
> > >> > >
> > >> > > Hi Stack,
> > >> > >
> > >> > > Can HBase (in theory) be used on filesystems/MR other than
> > Hadoop?
> > >> > >
> > >> > > I see one primary disadvantage of moving away from the Hadoop
> > >> project.
> > >> > > Please let me explain. In the Hadoop world, if a committer is
> > >> actively
> > >> > > contributing code, she/he becomes part of the Hadoop PMC. This
> > >> means
> > >> > > that
> > >> > > Hbase active hbase committers would (over time) become Hadoop
> > PMC
> > >> > > members.
> > >> > > This might allow Hbase-related fixes to get into HDFS much
> more
> > >> easily.
> > >> > > If
> > >> > > HBase moves away from Hadoop, then Hbase developers will not
> > have a
> > >> > > part to
> > >> > > play in guiding HDFS to make it more amenable to HBase usage.
> > >> > >
> > >> > > The case is different for ZK and avro. They are not related to
> > >> Hadoop
> > >> > > HDFS/MR at all.
> > >> > >
> > >> > > I am not voting against this proposal, just laying out my
> > >> viewpoint.
> > >> > >
> > >> > > thanks,
> > >> > > dhruba
> > >> > >
> > >> > >
> > >> > > On Thu, Mar 18, 2010 at 10:43 AM, Stack <st...@duboce.net>
> > wrote:
> > >> > >
> > >> > > > On Thu, Mar 18, 2010 at 10:15 AM, Andrew Purtell
> > >> > > <apurt...@apache.org>
> > >> > > > wrote:
> > >> > > > >
> > >> > > > > HBase is an integrated optional part of a Hadoop stack
> more
> > >> > > > > than a standalone component, but other ASF TLPs build on
> top
> > >> > > > > of other projects. I suppose HDFS and ZK are going to be
> > TLPs
> > >> > > > > at some point also, is that true? Leaving Hadoop as just
> the
> > >> > > > > MR framework?
> > >> > > >
> > >> > > > If the board allows us be a TLP, Zookeeper would probably be
> > made
> > >> a
> > >> > > > TLP at same time.
> > >> > > >
> > >> > > > There hasn't been a vote, but it seems that the thought is
> > that
> > >> HDFS
> > >> > > > would stay within the hadoop fold; i.e.
> hdfs+mapreduce+common
> > >> would
> > >> > > > stay.
> > >> > > >
> > >> > > > >
> > >> > > > > Anyway, what I like is HBase will stand on its own merits.
> > >> > > > >
> > >> > > > > What are the risks of being a TLP?
> > >> > > > >
> > >> > > >
> > >> > > > I'm sure there are some but I'm blinded by the upside at the
> > >> moment.
> > >> > > >
> > >> > > > St.Ack
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Connect to me at http://www.facebook.com/dhruba
> > >> >
> > >

Reply via email to