Re: Using HBase on other file systems

Edward Capriolo Tue, 11 May 2010 15:14:41 -0700

On Tue, May 11, 2010 at 5:40 PM, Jeff Hammerbacher <ham...@cloudera.com>wrote:


> Okay, the assertion that HBase is only interesting if you need HDFS is
> continuing to rankle for me. On the surface, it sounds reasonable, but it's
> just so wrong. The specifics cited (caching, HFile, and compaction) are
> actually all advantages of the HBase design.
>
> 1) Caching: any data store which targets multiple kinds of storage media
> with different latency characteristics will cache. Not interesting, and
> totally confusing to me how this could be cited as a disadvantage.
> 2) HFile: HFile is an on-disk layout of data to minimize seeks for random
> accesses while not hampering scans. Every system which stores data to
> magnetic drives must decide how to lay bits out on platters. HFile doesn't
> go that low, of course, but it's not an artifact of HBase using HDFS; see
> https://issues.apache.org/jira/browse/CASSANDRA-674 or
> http://blog.basho.com/2010/04/27/hello,-bitcask/, e.g. Avro defines an
> object file container format (
> http://avro.apache.org/docs/current/spec.html#Object+Container+Files) for
> the same purpose. HFile squeezes a lot of performance out of Java and is a
> pretty reasonable implementation. Again, I'm totally confused why this is
> cited as a disadvantage.
> 3) Compactions: HBase, like many modern data stores, is really just a
> hierarchy of buffers; some in memory, some on disk. Because of the
> characteristics of magnetic storage, this log-structured merge tree
> strategy
> does a nice job of minimizing seeks on the write path while reducing disk
> fragmentation on the read path. There is a slight penalty on the read path,
> as data can live in any of the buffers, but if you've ever managed a
> long-lived MySQL database, you'll be glad to amortize your pain across each
> read rather than paying the huge penalty of having a highly fragmented
> database send the disk head all across the disk during a scan. It is true
> that you could dedicate a single disk to the WAL rather than putting it on
> a
> DFS, and that may result in better performance; on the other hand, you
> increase system complexity, as you now have to implement replication and
> consistency guarantees for the WAL data if you want to survive machine
> failure.
>
> I certainly don't want this consternation to be perceived as ad hominem:
> I'm
> much more frustrated by the logic of the statement seeming reasonable on
> the
> surface, which is the level at which most people are able to evaluate
> systems, but being just completely wrong when examined in detail. There are
> just too many storage systems to choose from these days, and specious
> arguments for one or the other must be put to rest so users can make well
> informed decisions and not just latch on to the next shiny object that
> comes
> along.
>
>
> On Tue, May 11, 2010 at 2:03 PM, Jeff Hammerbacher <ham...@cloudera.com
> >wrote:
>
> > Hey Edward,
> >
> > Database systems have been built for decades against a storage medium
> > (spinning magnetic platters) which have the same characteristics you
> point
> > out in HDFS. In the interim, they've managed to service a large number of
> > low latency workloads in a reasonable fashion. There's a reason the
> capstone
> > assignment in the first databases course at Wisconsin was for years an
> > implementation of the PostgreSQL buffer pool--the caching logic for low
> > latency random access is the hard part.
> >
> > Having participated in the design and implementation of one of these
> other
> > data stores to which you refer, I agree that there are flaws in the
> BigTable
> > design. On the other hand, Solaris and Mach look a lot better on paper
> than
> > the Linux kernel. If you consider HBase to be a direct implementation of
> the
> > BigTable design, then I would argue that system has unequivocally proven
> its
> > utility at scale. Choosing a technology based on the problems it has
> solved
> > rather than the elegance of the design helps minimize project risk, in my
> > experience.
> >
> > Some day soon, as with databases, only a small subset of people well
> versed
> > in systems design will be arguing over implementation strategies. The
> rest
> > of the world will be using these technologies to solve problems and be
> > worried more about the interfaces they provide. I'm excited for HBase to
> > reach that stage.
> >
> > Thanks,
> > Jeff
> >
> >
> > On Tue, May 11, 2010 at 1:28 PM, Edward Capriolo <edlinuxg...@gmail.com
> >wrote:
> >
> >> On Tue, May 11, 2010 at 3:51 PM, Jeff Hammerbacher <ham...@cloudera.com
> >> >wrote:
> >>
> >> > Hey,
> >> >
> >> > Thanks for the evaluation, Andrew. Ceph certainly is elegant in
> design;
> >> > HDFS, similar to GFS [1], was purpose-built to get into production
> >> quickly,
> >> > so its current incarnation lacks some of the same elegance. On the
> other
> >> > hand, there are many techniques for making the metadata servers
> scalable
> >> > and
> >> > highly available. HDFS has the advantage of already storing hundreds
> of
> >> > petabytes across thousands of organizations, so we're able to guide
> >> those
> >> > design decisions with empirical data from heavily used clusters. We'd
> >> love
> >> > to have heavy users of HBase contribute to the discussions of
> >> scalability
> >> > [2] and availability [3] of HDFS. See also the excellent article from
> >> > Konstantin Schvako of Yahoo! on HDFS scalability [4].
> >> >
> >> > I've also conducted extensive reviews at both Facebook and now at
> >> Cloudera
> >> > of alternative file systems, but at this stage, I concur with Andrew:
> >> HDFS
> >> > is the only reasonable open source choice for production data
> processing
> >> > workloads. I'm also optimistic that the scalability and availability
> >> > challenges will be addressed by the (very active and diverse) HDFS
> >> > developer
> >> > community over the next few years, and we'll benefit from the work
> >> that's
> >> > already been put into the robustness and manageability of the system.
> >> >
> >> > Regardless, every technology improves more rapidly when there's strong
> >> > competition, so it will be good to see one of these other file systems
> >> > emerge as a viable alternative to HDFS for HBase storage some day.
> >> >
> >> > [1]
> >> >
> >> >
> >>
> http://cacm.acm.org/magazines/2010/3/76283-gfs-evolution-on-fast-forward/fulltext
> >> > [2] https://issues.apache.org/jira/browse/HDFS-1051
> >> > [3] https://issues.apache.org/jira/browse/HDFS-1064
> >> > [4]
> >> >
> >> >
> >>
> http://developer.yahoo.net/blogs/hadoop/2010/05/scalability_of_the_hadoop_dist.html
> >> >
> >> > Later,
> >> > Jeff
> >> >
> >> > On Sun, May 9, 2010 at 9:44 AM, Andrew Purtell <apurt...@apache.org>
> >> > wrote:
> >> >
> >> > > Our experience with Gluster 2 is that self heal when a brick drops
> off
> >> > the
> >> > > network is very painful. The high performance impact lasts for a
> long
> >> > time.
> >> > > I'm not sure but I think Gluster 3 may only rereplicate missing
> >> sections
> >> > > instead of entire files. On the other hand I would not trust Gluster
> 3
> >> to
> >> > be
> >> > > stable (yet).
> >> > >
> >> > > I've also tried KFS. My experience seem to bear out other
> observations
> >> > that
> >> > > it is ~30% slower that HDFS. Also I was unable to keep the
> >> chunkservers
> >> > up
> >> > > on my CentOS 5 based 64 bit systems. I give Sriram shell access so
> he
> >> > could
> >> > > poke around coredumps with gdb but there was no satisfactory
> >> resolution.
> >> > >
> >> > > Another team at Trend is looking at Ceph. I think it is a highly
> >> > promising
> >> > > filesystem but at the moment it is an experimental filesystem
> >> undergoing
> >> > a
> >> > > high rate of development that requires another experimental
> filesystem
> >> > > undergoing a high rate of development (btrfs) for recovery
> semantics,
> >> and
> >> > > the web site warns "NOT SAFE YET" or similar. I doubt it has ever
> been
> >> > > tested on clusters > 100 nodes. In contrast, HDFS has been running
> in
> >> > > production on clusters with 1000s of nodes for a long time.
> >> > >
> >> > > There currently is not a credible competitor to HDFS in my opinion.
> >> Ceph
> >> > is
> >> > > definitely worth keeping an eye on however. I wonder if HDFS will
> >> evolve
> >> > to
> >> > > offer a similar scalable metadata service (NameNode) to compete.
> >> > Certainly
> >> > > that would improve its scalability and availability story, both
> issues
> >> > today
> >> > > presenting barriers to adoption, and barriers for anything layered
> on
> >> > top,
> >> > > like HBase.
> >> > >
> >> > >   - Andy
> >> > >
> >> > >
> >> > > > From: Kevin Apte
> >> > > > Subject: Using HBase on other file systems
> >> > > > To: hbase-user@hadoop.apache.org
> >> > > > Date: Sunday, May 9, 2010, 5:08 AM
> >> > > >
> >> > > > I am wondering if anyone has thought
> >> > > > about using HBase on other file systems like "Gluster".  I
> >> > > > think Gluster may offer much faster performance without
> >> > > > exorbitant cost.  With Gluster, you would have to
> >> > > > fetch the data from the "Storage Bricks" and process it in
> >> > > > your own environment. This allows the
> >> > > > servers that are used as storage nodes very cheap.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >>
> >> Hbase is the most square peg, round hole piece of software ever (not an
> >> insult read on). HDFS was designed for high throughput streaming batch
> >> processing. The random access support is not good. hbase gets around the
> >> HDFS shortcomings using caching, HFiles, compaction processes, etc to
> make
> >> the HDFS (tape drive) seem great at all these things it is not good at.
> >>
> >> One compelling reason to use HBase is that you are already using HDFS
> for
> >> other things. IMHO If you do not need HDFS, you do not really need
> HBASE.
> >> One of the other unamed distributed key value stores will get the job
> >> done.
> >>
> >
> >
>


Jeff,

>>Choosing a technology based on the problems it has solved rather than the
elegance of the design helps minimize project risk, in my
>>experience.

I agree with this. I am not trying to imply that HBase is risky or not
proven at scale. I do think that if you compare GoogleFS to HDFS, GFS looks
more full featured. HDFS seems to be very focused on what I consider a pure
implementation, primarily designed for map reduce workloads.

I do believe my logic is reasonable. HBase has a lot of code designed around
HDFS.  We know these tickets that get cited all the time, for better random
reads, or for sync() support. HBase gets the benefits of HDFS and has to
deal with its drawbacks. Other key value stores handle storage directly.

>>Okay, the assertion that HBase is only interesting if you need HDFS is
>>continuing to rankle for me.

Do not be rakled :)  What I meant, more or less, HBase is always a solution
for a key value store. It is an even better solution if you want the
underlying data stored on HDFS to run map/reduce efficiently on the data.

However since the topic started like "Can I run Hbase ontop of something
besides HDFS?". The quick answers are:
theoretically: yes
practically: as in by tomorrow without knowledge of the code base: no
>From the outside looking in: if you do not want HDFS you probably do not
want HBase.

:)

Re: Using HBase on other file systems

Reply via email to