One other thought is that in the not-so-distant future HBase may take on subprojects of its own.
> -----Original Message----- > From: Jonathan Gray [mailto:jg...@facebook.com] > Sent: Thursday, March 18, 2010 5:00 PM > To: hbase-dev@hadoop.apache.org > Subject: RE: [DISCUSS] HBase as Apache top-level project? > > Isn't the hard spot where we've always been? :) > > Annoyance has really not gotten us anywhere. And I don't think it > matters to those in Hadoop whether we are a TLP or SP, they will not > (or should not) be offended if we break off. Do you think they would > take us (or our patches) less seriously if we were a TLP? > > What has pushed things forward is continuing to make HBase better so > that more people want to use it. A larger community and involvement > from larger companies will help push Hadoop changes aimed at HBase, > especially when those companies are Hadoop contributors. > > > I think being a TLP is good because it gives us autonomy, more > visibility, and some kind of external validation from Apache that HBase > has risen to that level (which I believe it has). I see the risks as > not too serious. > > If we do think we can get some HBase committers onto the Hadoop PMC, > and we think that this will make a material difference in outcomes for > us, then my opinion may change. Today I don't really think the issue > is whether we are on the Hadoop PMC or not... my understanding is that > big decisions are not voted on for a majority, if someone votes against > it then it is tabled. > > JG > > > -----Original Message----- > > From: saint....@gmail.com [mailto:saint....@gmail.com] On Behalf Of > > Stack > > Sent: Thursday, March 18, 2010 4:09 PM > > To: hbase-dev@hadoop.apache.org > > Subject: Re: [DISCUSS] HBase as Apache top-level project? > > > > On Thu, Mar 18, 2010 at 1:07 PM, Jonathan Gray <jg...@facebook.com> > > wrote: > > > Will HDFS patches aimed at helping the HBase use case (which is not > > strictly limited to HBase but rather our pattern that differs from > MR) > > be any less likely to get pushed through if we become a TLP rather > than > > sub-project? In reality I don't think the distinction makes a > > practical difference in that sense. > > > > > > > If there are hbase-friendly committers up in hadoop they can marshall > > through hbase-friendly patches. Then whether we're under hadoop or > > TLP matters less (though I do think Jay Booth has a good point when > he > > suggests that the best way to make the case for the hbase hdfs access > > pattern is to '"stay, and be more annoying...") > > > > Currently we have only one hbase committer who is also a committer in > > hadoop and the path to more than this is involved if we move out from > > under hadoop, Dhruba's point (Its just been confirmed that an hbase > > committer of a year or so vintage qualifies as a nominee to hadoop > > pmc). > > > > > > > The things that will really help push the HDFS+HBase relationship > are > > things like committers of HDFS being users or contributors of HBase. > > Recent interest from Facebook and Cloudera, who each have multiple > > committers to Hadoop, has really pushed things along nicely in recent > > weeks. > > > > > > > This is true. Its for sure made more difference than that one > > hbase-friendly committer has done during his tenure as an hadoop > > committer. > > > > The downside though is that there is nothing to stop the above > > companies changing their minds and then a TLP hbase would be in an > > hard spot. > > > > St.Ack > > > > > > > JG > > > > > >> -----Original Message----- > > >> From: Jay Booth [mailto:jaybo...@gmail.com] > > >> Sent: Thursday, March 18, 2010 12:45 PM > > >> To: hbase-dev@hadoop.apache.org > > >> Subject: Re: [DISCUSS] HBase as Apache top-level project? > > >> > > >> I'm neither an HBase user (just yet) or a contributor so my > opinion > > >> isn't > > >> really worth a whole lot here.. > > >> > > >> But I see HBase as being more similar to MapReduce than to ZK or > > Avro > > >> as far > > >> as becoming a top-level project. Theoretically you can plug in > > >> alternate > > >> filesystems but in reality, both systems run on HDFS as of now and > > >> might run > > >> on other stuff in the future. I agree that there's sometimes been > a > > >> lack of > > >> urgency with regard to HDFS patches that affect HBase but not > > Mapreduce > > >> -- > > >> but I think HBase leaving the project wouldn't really help, and > > could > > >> hurt > > >> both HBase and HDFS. > > >> > > >> In other words, HDFS needs a tenant like HBase to push the use > cases > > >> that > > >> MapReduce doesn't cover -- if there are problems with > communication > > btw > > >> subprojects or with HDFS committer priorities, we should address > > those > > >> issues rather than split HBase off and amplify the distance. With > > >> MapReduce > > >> and HBase both stretching the capabilities, HDFS can continue to > > evolve > > >> into > > >> being a (the?) robust, performant, mature distributed filesystem. > > If > > >> it > > >> only optimizes for one use case, then it's just a niche i/o layer > > for > > >> mapreduce. > > >> > > >> So I guess my opinion is, "stay, and be more annoying" :) But in > a > > >> good > > >> way. > > >> > > >> > > >> On Thu, Mar 18, 2010 at 3:09 PM, Jonathan Gray > <jg...@facebook.com> > > >> wrote: > > >> > > >> > I would like to see HBase support alternative filesystems in the > > >> future. > > >> > There have been talks of other up and coming DFSs that were > built > > >> more for > > >> > random access that might make sense for some use cases. I > imagine > > a > > >> time > > >> > down the road where there would be a choice of DFS depending on > a > > >> particular > > >> > use case. > > >> > > > >> > Users coming from the Hadoop world who would be utilizing both > and > > >> likely > > >> > be more tuned towards analytics would just add HBase atop > Hadoop. > > >> Someone > > >> > coming from a relational database who is interested in fast > > >> read/write > > >> > random access might be able to choose a DFS more closely suited > to > > >> that use > > >> > case. Hopefully HDFS gets better at this so it could be the > > leader > > >> across > > >> > the board, but I don't think we should necessarily be married to > > it. > > >> > Besides possible differences in append APIs, in general, it > > should > > >> not be > > >> > difficult to plug a different DFS in (and it's been done in the > > past > > >> with > > >> > kfs). > > >> > > > >> > While it would be nice if active HBase committers were > eventually > > >> made into > > >> > Hadoop PMC committers, to this point this has not happened (I > > believe > > >> stack > > >> > was already on Hadoop PMC when HBase become a sub-project). > When > > we > > >> want to > > >> > add a new committer we now have to build a case to people who > > >> actually have > > >> > no community insight rather than allowing our community (which I > > >> believe is > > >> > big enough to support itself) to make their own decisions. > > >> > > > >> > Also, I've not seen Stack's presence on the Hadoop PMC in any > way > > >> > contribute to the likelihood of an HDFS patch getting committed. > > >> > > > >> > That being said, we would not want to create any bad blood w/ > the > > >> Hadoop > > >> > community. Dhruba, do you think that is a risk? > > >> > > > >> > JG > > >> > > > >> > > -----Original Message----- > > >> > > From: Dhruba Borthakur [mailto:dhr...@gmail.com] > > >> > > Sent: Thursday, March 18, 2010 11:08 AM > > >> > > To: hbase-dev@hadoop.apache.org > > >> > > Subject: Re: [DISCUSS] HBase as Apache top-level project? > > >> > > > > >> > > Hi Stack, > > >> > > > > >> > > Can HBase (in theory) be used on filesystems/MR other than > > Hadoop? > > >> > > > > >> > > I see one primary disadvantage of moving away from the Hadoop > > >> project. > > >> > > Please let me explain. In the Hadoop world, if a committer is > > >> actively > > >> > > contributing code, she/he becomes part of the Hadoop PMC. This > > >> means > > >> > > that > > >> > > Hbase active hbase committers would (over time) become Hadoop > > PMC > > >> > > members. > > >> > > This might allow Hbase-related fixes to get into HDFS much > more > > >> easily. > > >> > > If > > >> > > HBase moves away from Hadoop, then Hbase developers will not > > have a > > >> > > part to > > >> > > play in guiding HDFS to make it more amenable to HBase usage. > > >> > > > > >> > > The case is different for ZK and avro. They are not related to > > >> Hadoop > > >> > > HDFS/MR at all. > > >> > > > > >> > > I am not voting against this proposal, just laying out my > > >> viewpoint. > > >> > > > > >> > > thanks, > > >> > > dhruba > > >> > > > > >> > > > > >> > > On Thu, Mar 18, 2010 at 10:43 AM, Stack <st...@duboce.net> > > wrote: > > >> > > > > >> > > > On Thu, Mar 18, 2010 at 10:15 AM, Andrew Purtell > > >> > > <apurt...@apache.org> > > >> > > > wrote: > > >> > > > > > > >> > > > > HBase is an integrated optional part of a Hadoop stack > more > > >> > > > > than a standalone component, but other ASF TLPs build on > top > > >> > > > > of other projects. I suppose HDFS and ZK are going to be > > TLPs > > >> > > > > at some point also, is that true? Leaving Hadoop as just > the > > >> > > > > MR framework? > > >> > > > > > >> > > > If the board allows us be a TLP, Zookeeper would probably be > > made > > >> a > > >> > > > TLP at same time. > > >> > > > > > >> > > > There hasn't been a vote, but it seems that the thought is > > that > > >> HDFS > > >> > > > would stay within the hadoop fold; i.e. > hdfs+mapreduce+common > > >> would > > >> > > > stay. > > >> > > > > > >> > > > > > > >> > > > > Anyway, what I like is HBase will stand on its own merits. > > >> > > > > > > >> > > > > What are the risks of being a TLP? > > >> > > > > > > >> > > > > > >> > > > I'm sure there are some but I'm blinded by the upside at the > > >> moment. > > >> > > > > > >> > > > St.Ack > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > -- > > >> > > Connect to me at http://www.facebook.com/dhruba > > >> > > > >