Hey Todd, Bulk loading isn't always an option when data is streaming in from a live application. Many big data use cases involve massive amounts of smaller items in the size range of 10-100 bytes, for example URLs, sensor readings, genome sequence reads, network traffic logs, etc. If HBase requires 2-3 times the amount of hardware to avoid *Concurrent mode failures*, then that makes HBase 2-3 times more expensive from the standpoint of hardware, power consumption, and datacenter real estate.
What takes the most time is getting the core database mechanics right (we're going on 5 years now). Once the core database is stable, integration with applications such as Solr and others are short term projects. I believe that sooner or later, most engineers working in this space will come to the conclusion that Java is the wrong language for this kind of database application. At that point, folks on the HBase project will realize that they are five years behind. - Doug On Mon, Feb 13, 2012 at 11:33 AM, Todd Lipcon <[email protected]> wrote: > Hey Doug, > > Want to also run a comparison test with inter-cluster replication > turned on? How about kerberos-based security on secure HDFS? How about > ACLs or other table permissions even without strong authentication? > Can you run a test comparing performance running on top of Hadoop > 0.23? How about running other ecosystem products like Solbase, > Havrobase, and Lily, or commercial products like Digital Reasoning's > Synthesys, etc? > > For those unfamiliar, the answer to all of the above is that those > comparisons can't be run because Hypertable is years behind HBase in > terms of features, adoption, etc. They've found a set of benchmarks > they win at, but bulk loading either database through the "put" API is > the wrong way to go about it anyway. Anyone loading 5T of data like > this would use the bulk load APIs which are one to two orders of > magnitude more efficient. Just ask the Yahoo crawl cache team, who has > ~1PB stored in HBase, or Facebook, or eBay, or many others who store > hundreds to thousands of TBs in HBase today. > > Thanks, > -Todd > > On Mon, Feb 13, 2012 at 9:07 AM, Doug Judd <[email protected]> wrote: > > In our original test, we mistakenly ran the HBase test with > > the hbase.hregion.memstore.mslab.enabled property set to false. We > re-ran > > the test with the hbase.hregion.memstore.mslab.enabled property set to > true > > and have reported the results in the following addendum: > > > > Addendum to Hypertable vs. HBase Performance > > Test< > http://www.hypertable.com/why_hypertable/hypertable_vs_hbase_2/addendum/> > > > > Synopsis: It slowed performance on the 10KB and 1KB tests and still > failed > > the 100 byte and 10 byte tests with *Concurrent mode failure* > > > > - Doug > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
