RE: Using HBase on other file systems

Gibbon, Robert, VF-Group Sat, 15 May 2010 13:24:47 -0700

Todd thanks for replying. 4x 7200 spindles and no RAID = approx 360 IOPS 
to/from the backend storage, minimum and per node to run an HBase cluster.


Right?

cheers
Robert

-----Original Message-----
From: Todd Lipcon [mailto:t...@cloudera.com]
Sent: Sat 5/15/2010 3:51 AM
To: hbase-user@hadoop.apache.org
Subject: Re: Using HBase on other file systems
 
On Fri, May 14, 2010 at 2:15 PM, Gibbon, Robert, VF-Group <
robert.gib...@vodafone.com> wrote:

> Hmm. What level of IOPs does Hbase need in order to support a reasonably
> responsive level of service? How much latency in transfer times is
> acceptable before the nodes start to fail? Do you use asynchronous IO
> queueing? Write-through caching? Prefetching?
>
>
Hi Robert. Have you read the Bigtable paper? It's a good description of the
general IO architecture of BigTable. You can also read the original paper on
Log-structured merge tree storage from back in the 90s.

To answer your questions in brief:
- Typical clusters run on between 4 and 12x 7200RPM SATA disks. Some people
run on 10k disks to get more random reads per second, but not necessary
- latency in transfer times is a matter of what your application needs, not
a matter of what HBase needs.
- no, we do not asynchronously queue reads - AIO support is lacking in Java
6 and even in the current previews of Java7 it is a thin wrapper around
threadpools and synchronous IO APIs.
- HBases uses log-structured storage, which is somewhat the same as
write-through caching in a way. We never do random-writes (in fact they're
impossible in HDFS)

-Todd


>
> On Fri, May 14, 2010 at 12:02 PM, Gibbon, Robert, VF-Group <
> robert.gib...@vodafone.com> wrote:
>
> >
> > My thinking is around separation of concerns - at an OU level not just at
> a
> > system integration level. Walrus gives me a consistent, usable
> abstraction
> > layer to transparently substitute the storage implementation - for
> example
> > from symmetrix <--> isilon or anything in between. Walrus is storage
> > subsystem agnostic, so it need not be configured for inconsistency like
> the
> > Amazon service it emulates.
> >
> > Tight coupling for lock-in is a great commercial technique often seen
> with
> > suppliers. But it is a bad one. Very bad.
> >
>
> However, reasonably tight coupling between a database (HBase) and its
> storage layer (HDFS) is IMHO absolutely necessary to achieve a certain
> level
> of correctness and performance. In HBase's case we use the Hadoop
> FileSystem
> interface, so in theory it will work on anyone who has implemented said
> interface, but I wouldn't run a production instance on anything but HDFS.
>
> It's worth noting that most commercial databases operate on direct block
> devices rather than on top of filesystems, so that they don't have to deal
> with varying semantics/performance between ext3,ext4,xfs,ufs, myriad other
> single-node filesystems that exist.
>
> -Todd
>
>
> >
> >
> > -----Original Message-----
> > From: Andrew Purtell [mailto:apurt...@apache.org]
> > Sent: Thu 5/13/2010 11:54 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: RE: Using HBase on other file systems
> >
> > You really want to run HBase backed by Eucalyptus' Walrus? What do you
> have
> > behind that?
> >
> > > From: Gibbon, Robert, VF-Group
> > > Subject: RE: Using HBase on other file systems
> > [...]
> > > NB. I checked out running HBase over Walrus (an AWS S3
> > > clone): bork - you want me to file a Jira on that?
> >
> >
> >
> >
> >
> >
> >
> >
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

RE: Using HBase on other file systems

Reply via email to