On Thu, Mar 18, 2010 at 1:07 PM, Jonathan Gray <jg...@facebook.com> wrote: > Will HDFS patches aimed at helping the HBase use case (which is not strictly > limited to HBase but rather our pattern that differs from MR) be any less > likely to get pushed through if we become a TLP rather than sub-project? In > reality I don't think the distinction makes a practical difference in that > sense. >
If there are hbase-friendly committers up in hadoop they can marshall through hbase-friendly patches. Then whether we're under hadoop or TLP matters less (though I do think Jay Booth has a good point when he suggests that the best way to make the case for the hbase hdfs access pattern is to '"stay, and be more annoying...") Currently we have only one hbase committer who is also a committer in hadoop and the path to more than this is involved if we move out from under hadoop, Dhruba's point (Its just been confirmed that an hbase committer of a year or so vintage qualifies as a nominee to hadoop pmc). > The things that will really help push the HDFS+HBase relationship are things > like committers of HDFS being users or contributors of HBase. Recent > interest from Facebook and Cloudera, who each have multiple committers to > Hadoop, has really pushed things along nicely in recent weeks. > This is true. Its for sure made more difference than that one hbase-friendly committer has done during his tenure as an hadoop committer. The downside though is that there is nothing to stop the above companies changing their minds and then a TLP hbase would be in an hard spot. St.Ack > JG > >> -----Original Message----- >> From: Jay Booth [mailto:jaybo...@gmail.com] >> Sent: Thursday, March 18, 2010 12:45 PM >> To: hbase-dev@hadoop.apache.org >> Subject: Re: [DISCUSS] HBase as Apache top-level project? >> >> I'm neither an HBase user (just yet) or a contributor so my opinion >> isn't >> really worth a whole lot here.. >> >> But I see HBase as being more similar to MapReduce than to ZK or Avro >> as far >> as becoming a top-level project. Theoretically you can plug in >> alternate >> filesystems but in reality, both systems run on HDFS as of now and >> might run >> on other stuff in the future. I agree that there's sometimes been a >> lack of >> urgency with regard to HDFS patches that affect HBase but not Mapreduce >> -- >> but I think HBase leaving the project wouldn't really help, and could >> hurt >> both HBase and HDFS. >> >> In other words, HDFS needs a tenant like HBase to push the use cases >> that >> MapReduce doesn't cover -- if there are problems with communication btw >> subprojects or with HDFS committer priorities, we should address those >> issues rather than split HBase off and amplify the distance. With >> MapReduce >> and HBase both stretching the capabilities, HDFS can continue to evolve >> into >> being a (the?) robust, performant, mature distributed filesystem. If >> it >> only optimizes for one use case, then it's just a niche i/o layer for >> mapreduce. >> >> So I guess my opinion is, "stay, and be more annoying" :) But in a >> good >> way. >> >> >> On Thu, Mar 18, 2010 at 3:09 PM, Jonathan Gray <jg...@facebook.com> >> wrote: >> >> > I would like to see HBase support alternative filesystems in the >> future. >> > There have been talks of other up and coming DFSs that were built >> more for >> > random access that might make sense for some use cases. I imagine a >> time >> > down the road where there would be a choice of DFS depending on a >> particular >> > use case. >> > >> > Users coming from the Hadoop world who would be utilizing both and >> likely >> > be more tuned towards analytics would just add HBase atop Hadoop. >> Someone >> > coming from a relational database who is interested in fast >> read/write >> > random access might be able to choose a DFS more closely suited to >> that use >> > case. Hopefully HDFS gets better at this so it could be the leader >> across >> > the board, but I don't think we should necessarily be married to it. >> > Besides possible differences in append APIs, in general, it should >> not be >> > difficult to plug a different DFS in (and it's been done in the past >> with >> > kfs). >> > >> > While it would be nice if active HBase committers were eventually >> made into >> > Hadoop PMC committers, to this point this has not happened (I believe >> stack >> > was already on Hadoop PMC when HBase become a sub-project). When we >> want to >> > add a new committer we now have to build a case to people who >> actually have >> > no community insight rather than allowing our community (which I >> believe is >> > big enough to support itself) to make their own decisions. >> > >> > Also, I've not seen Stack's presence on the Hadoop PMC in any way >> > contribute to the likelihood of an HDFS patch getting committed. >> > >> > That being said, we would not want to create any bad blood w/ the >> Hadoop >> > community. Dhruba, do you think that is a risk? >> > >> > JG >> > >> > > -----Original Message----- >> > > From: Dhruba Borthakur [mailto:dhr...@gmail.com] >> > > Sent: Thursday, March 18, 2010 11:08 AM >> > > To: hbase-dev@hadoop.apache.org >> > > Subject: Re: [DISCUSS] HBase as Apache top-level project? >> > > >> > > Hi Stack, >> > > >> > > Can HBase (in theory) be used on filesystems/MR other than Hadoop? >> > > >> > > I see one primary disadvantage of moving away from the Hadoop >> project. >> > > Please let me explain. In the Hadoop world, if a committer is >> actively >> > > contributing code, she/he becomes part of the Hadoop PMC. This >> means >> > > that >> > > Hbase active hbase committers would (over time) become Hadoop PMC >> > > members. >> > > This might allow Hbase-related fixes to get into HDFS much more >> easily. >> > > If >> > > HBase moves away from Hadoop, then Hbase developers will not have a >> > > part to >> > > play in guiding HDFS to make it more amenable to HBase usage. >> > > >> > > The case is different for ZK and avro. They are not related to >> Hadoop >> > > HDFS/MR at all. >> > > >> > > I am not voting against this proposal, just laying out my >> viewpoint. >> > > >> > > thanks, >> > > dhruba >> > > >> > > >> > > On Thu, Mar 18, 2010 at 10:43 AM, Stack <st...@duboce.net> wrote: >> > > >> > > > On Thu, Mar 18, 2010 at 10:15 AM, Andrew Purtell >> > > <apurt...@apache.org> >> > > > wrote: >> > > > > >> > > > > HBase is an integrated optional part of a Hadoop stack more >> > > > > than a standalone component, but other ASF TLPs build on top >> > > > > of other projects. I suppose HDFS and ZK are going to be TLPs >> > > > > at some point also, is that true? Leaving Hadoop as just the >> > > > > MR framework? >> > > > >> > > > If the board allows us be a TLP, Zookeeper would probably be made >> a >> > > > TLP at same time. >> > > > >> > > > There hasn't been a vote, but it seems that the thought is that >> HDFS >> > > > would stay within the hadoop fold; i.e. hdfs+mapreduce+common >> would >> > > > stay. >> > > > >> > > > > >> > > > > Anyway, what I like is HBase will stand on its own merits. >> > > > > >> > > > > What are the risks of being a TLP? >> > > > > >> > > > >> > > > I'm sure there are some but I'm blinded by the upside at the >> moment. >> > > > >> > > > St.Ack >> > > > >> > > >> > > >> > > >> > > -- >> > > Connect to me at http://www.facebook.com/dhruba >> > >