Re: Addendum to Hypertable vs. HBase Performance Test (w/ mslab enabled)

Doug Judd Fri, 17 Feb 2012 16:25:49 -0800

Hi Edward,

The problem is that even if the workload is 5% write and 95% read, if you
can't load the data, you need more machines.  In the 167 billion insert
test, HBase failed with *Concurrent mode failure* after 20% of the data was
loaded.  One of our customers has loaded 1/2 trillion records of historical
financial market data on 16 machines.  If you do the back-of-the-envelope
calculation, it would take about 180 machines for HBase to load 1/2
trillion cells.  That makes HBase 10X more expensive in terms of hardware,
power consumption, and data center real estate.


- Doug

On Fri, Feb 17, 2012 at 3:58 PM, Edward Capriolo <[email protected]>wrote:

> I would almost agree with prospective. But their is a problem with 'java is
> slow' theory. The reason is that in a 100 percent write workload gc might
> be a factor.
>
> But in the real world people have to read data and read becomes disk bound
> as your data gets larger then memory.
>
> Unless C++ can make your disk spin faster then java It is a wash. Making a
> claim that your going to need more servers for java/hbase is bogus. To put
> it in prospective, if the workload is 5 % write and 95 % read you are
> probably going to need just the same amount of hardware.
>
> You might get some win on the read size because your custom caching could
> be more efficient in terms of object size in memory and other gc issues but
> it is not 2 or 3 to one.
>
> If a million writes fall into a hypertable forest but it take a billion
> years to read them back did the writes ever sync :)
>
>
> On Monday, February 13, 2012, Doug Judd <[email protected]> wrote:
> > Hey Todd,
> >
> > Bulk loading isn't always an option when data is streaming in from a live
> > application.  Many big data use cases involve massive amounts of smaller
> > items in the size range of 10-100 bytes, for example URLs, sensor
> readings,
> > genome sequence reads, network traffic logs, etc.  If HBase requires 2-3
> > times the amount of hardware to avoid *Concurrent mode failures*, then
> that
> > makes HBase 2-3 times more expensive from the standpoint of hardware,
> power
> > consumption, and datacenter real estate.
> >
> > What takes the most time is getting the core database mechanics right
> > (we're going on 5 years now).  Once the core database is stable,
> > integration with applications such as Solr and others are short term
> > projects.  I believe that sooner or later, most engineers working in this
> > space will come to the conclusion that Java is the wrong language for
> this
> > kind of database application.  At that point, folks on the HBase project
> > will realize that they are five years behind.
> >
> > - Doug
> >
> > On Mon, Feb 13, 2012 at 11:33 AM, Todd Lipcon <[email protected]> wrote:
> >
> >> Hey Doug,
> >>
> >> Want to also run a comparison test with inter-cluster replication
> >> turned on? How about kerberos-based security on secure HDFS? How about
> >> ACLs or other table permissions even without strong authentication?
> >> Can you run a test comparing performance running on top of Hadoop
> >> 0.23? How about running other ecosystem products like Solbase,
> >> Havrobase, and Lily, or commercial products like Digital Reasoning's
> >> Synthesys, etc?
> >>
> >> For those unfamiliar, the answer to all of the above is that those
> >> comparisons can't be run because Hypertable is years behind HBase in
> >> terms of features, adoption, etc. They've found a set of benchmarks
> >> they win at, but bulk loading either database through the "put" API is
> >> the wrong way to go about it anyway. Anyone loading 5T of data like
> >> this would use the bulk load APIs which are one to two orders of
> >> magnitude more efficient. Just ask the Yahoo crawl cache team, who has
> >> ~1PB stored in HBase, or Facebook, or eBay, or many others who store
> >> hundreds to thousands of TBs in HBase today.
> >>
> >> Thanks,
> >> -Todd
> >>
> >> On Mon, Feb 13, 2012 at 9:07 AM, Doug Judd <[email protected]> wrote:
> >> > In our original test, we mistakenly ran the HBase test with
> >> > the hbase.hregion.memstore.mslab.enabled property set to false.  We
> >> re-ran
> >> > the test with the hbase.hregion.memstore.mslab.enabled property set to
> >> true
> >> > and have reported the results in the following addendum:
> >> >
> >> > Addendum to Hypertable vs. HBase Performance
> >> > Test<
> >>
> http://www.hypertable.com/why_hypertable/hypertable_vs_hbase_2/addendum/>
> >> >
> >> > Synopsis: It slowed performance on the 10KB and 1KB tests and still
> >> failed
> >> > the 100 byte and 10 byte tests with *Concurrent mode failure*
> >> >
> >> > - Doug
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >
>

Re: Addendum to Hypertable vs. HBase Performance Test (w/ mslab enabled)

Reply via email to