Hi Jim, > BTW: Cloudera's next release is going to be based on 0.20, and > they will either include HBase as alpha software, or put us > in their supported stack, depending on the reaction from our > community.
What does that mean, "depending on the reaction from our community"? > If we do this, Cloudera has volunteered to run that > script on EC2 on a ~100 node cluster to burn it in. (They have > some arrangement with Amazon) and they have volunteered to run > the test on a "big" cluster for us. I think HBase, as well as Hadoop frankly, can also use a reasonably scaled performance, reliability, and fault tolerance automated test platform. (See "Re: scanner is returning everything in parent region plus one of the daughters?") Think of it as expanding Hudson to a cluster of several nodes hosted with community resources, perhaps on EC2, running some suite once per day, or perhaps triggered by a project once they reach a certain milestone, so each project could be allocated a budget in terms of hours/month and time limits of hours/day or similar. ~10 nodes seems reasonably affordable, with ~100 used on occasion, the difference being daily versus weekly, or weekly versus monthly. Stepping back from blue sky, I wonder if HBase anyway can pool resources to run such a reasonably scaled performance, reliability, and fault tolerance automated test at least twice a week. 10 extra large EC2 instances running 5 hours per day is about $300/month. - Andy ________________________________ From: Jim Kellerman (POWERSET) <[email protected]> To: "[email protected]" <[email protected]> Sent: Sunday, June 14, 2009 3:03:01 AM Subject: You guys rocked the house this week! I have received nothing but compliments in all my schmoozing this week. Although I was mostly absent from 0.20, it is 0.20 that has everyone excited. Congrats, and great work guys. However, we still have to deliver on 0.20. It has to be rock solid, or the buzz will turn against us. Friday, I was at Cloudera along with Doug C, Erik14, Owen O'Malley, Arun Murthy, Alan Gates (Pig), a guy from Hive (whose name I can't remember at the moment), Dhruba (facebook) and, of course, the Cloudera guys (Todd Lipcon, Jeff Hammerbacher, Christopne, Amr, etc.) The day went something like this: 1. 1st exercise: write (on a postit) 5 things you like about hadoop and 5 things you don't - most people submitted more than 10) and discussion. 2. 2nd exercise: write (on a postit) features that you'd like to see in the short term in hadoop - We had submissions that were truly short term and some that were truly "blue sky". These were divided into categories: Map/Reduce, HDFS, Build/Test, Core (including Avro) - We then split up into separate sessions. I attended HDFS. (the session leaders are supposed to send in notes from their session, and as soon as I get them, I will post.) - The biggest issue from HDFS was append (actually flush/sync), and not just me, there were about 7 votes for it (just "append") whereas my votes were like "flush/sync in 0.21" and HADOOP-4379 in 0.20.x. 3. Third session: Blue Sky: not much happened here because every one was kind of burned out at this point. Important points (for HBase): 1. We need to deliver a rock-solid 0.20 release or we will lose all the credibility that we gained this week. BTW: Cloudera's next release is going to be based on 0.20, and they will either include HBase as alpha software, or put us in their supported stack, depending on the reaction from our community. And despite the fact that their revenue stream depends on the Hadoop community, I got the feeling that they are getting pressured to have a version of HBase (not so much on 0.18, but more on 0.20). They have a '$' interest in seeing us succeed. 2. Once we get 0.20 out, we need to focus on beating the sh*t out of HADOOP-4379 patch for 0.20. Once we think it is solid, we need to create a script that randomly fails region server and datanodes. If we do this, Cloudera has volunteered to run that script on EC2 on a ~100 node cluster to burn it in. (They have some arrangement with Amazon) and they have volunteered to run the test on a "big" cluster for us. They will run it for several days if necessary to prove that it works. -- We need to be sure HADOOP-4379 is solid, which could lead to getting 4379 into hadoop 0.20.x if so. Dhruba, who led the HFDS breakout session, will do what it takes to fix issues around his current patch, provided we provide feedback to him. However, if we don't do #1 above, it won't matter. -- Master failover works -- Region server failover works. 3. After this week, both PIG and Hive are excited about using HBase as a source and a sink for their map-reduce jobs that they spawn. They have both come to realize that we are becoming more important in the Hadoop community, and are willing to devote resources to make their stuff work with HBase. (They will look bad if the other supports HBase and they do not - although to be fair, there was no data store available before this that met their needs either). So keep up the great work and MAKE SURE 0.20 IS ROCK SOLID STABLE!!! --- Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
