Ok I will read the paper again in more detail. It would be a big help if you published some recommended baseline deployment specs for HBase for typical OLTP and OLAP configurations. Maybe you already did and I missed them.
take it easy -----Original Message----- From: Todd Lipcon [mailto:t...@cloudera.com] Sent: Sun 5/16/2010 12:27 AM To: hbase-user@hadoop.apache.org Subject: Re: Using HBase on other file systems On Sat, May 15, 2010 at 1:19 PM, Gibbon, Robert, VF-Group < robert.gib...@vodafone.com> wrote: > > Todd thanks for replying. 4x 7200 spindles and no RAID = approx 360 IOPS > to/from the backend storage, minimum and per node to run an HBase cluster. > > If you want to achieve 360 random reads per second per node, then yes :) If you're only doing scans, or you're rarely reading (eg an archival storage system) then you hardly need any random read capacity at all. My laptop may have 4G of RAM, but does that mean that all laptops need 4G to work? Only if you want to put 4G of data in memory! -Todd > -----Original Message----- > From: Todd Lipcon [mailto:t...@cloudera.com] > Sent: Sat 5/15/2010 3:51 AM > To: hbase-user@hadoop.apache.org > Subject: Re: Using HBase on other file systems > > On Fri, May 14, 2010 at 2:15 PM, Gibbon, Robert, VF-Group < > robert.gib...@vodafone.com> wrote: > > > Hmm. What level of IOPs does Hbase need in order to support a reasonably > > responsive level of service? How much latency in transfer times is > > acceptable before the nodes start to fail? Do you use asynchronous IO > > queueing? Write-through caching? Prefetching? > > > > > Hi Robert. Have you read the Bigtable paper? It's a good description of the > general IO architecture of BigTable. You can also read the original paper > on > Log-structured merge tree storage from back in the 90s. > > To answer your questions in brief: > - Typical clusters run on between 4 and 12x 7200RPM SATA disks. Some people > run on 10k disks to get more random reads per second, but not necessary > - latency in transfer times is a matter of what your application needs, not > a matter of what HBase needs. > - no, we do not asynchronously queue reads - AIO support is lacking in Java > 6 and even in the current previews of Java7 it is a thin wrapper around > threadpools and synchronous IO APIs. > - HBases uses log-structured storage, which is somewhat the same as > write-through caching in a way. We never do random-writes (in fact they're > impossible in HDFS) > > -Todd > > > > > > On Fri, May 14, 2010 at 12:02 PM, Gibbon, Robert, VF-Group < > > robert.gib...@vodafone.com> wrote: > > > > > > > > My thinking is around separation of concerns - at an OU level not just > at > > a > > > system integration level. Walrus gives me a consistent, usable > > abstraction > > > layer to transparently substitute the storage implementation - for > > example > > > from symmetrix <--> isilon or anything in between. Walrus is storage > > > subsystem agnostic, so it need not be configured for inconsistency like > > the > > > Amazon service it emulates. > > > > > > Tight coupling for lock-in is a great commercial technique often seen > > with > > > suppliers. But it is a bad one. Very bad. > > > > > > > However, reasonably tight coupling between a database (HBase) and its > > storage layer (HDFS) is IMHO absolutely necessary to achieve a certain > > level > > of correctness and performance. In HBase's case we use the Hadoop > > FileSystem > > interface, so in theory it will work on anyone who has implemented said > > interface, but I wouldn't run a production instance on anything but HDFS. > > > > It's worth noting that most commercial databases operate on direct block > > devices rather than on top of filesystems, so that they don't have to > deal > > with varying semantics/performance between ext3,ext4,xfs,ufs, myriad > other > > single-node filesystems that exist. > > > > -Todd > > > > > > > > > > > > > -----Original Message----- > > > From: Andrew Purtell [mailto:apurt...@apache.org] > > > Sent: Thu 5/13/2010 11:54 PM > > > To: hbase-user@hadoop.apache.org > > > Subject: RE: Using HBase on other file systems > > > > > > You really want to run HBase backed by Eucalyptus' Walrus? What do you > > have > > > behind that? > > > > > > > From: Gibbon, Robert, VF-Group > > > > Subject: RE: Using HBase on other file systems > > > [...] > > > > NB. I checked out running HBase over Walrus (an AWS S3 > > > > clone): bork - you want me to file a Jira on that? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera > > -- Todd Lipcon Software Engineer, Cloudera