I'd say that the current state of Hbase is more suited to offline processing 
than to online serving duties, but I do envision that the roadmap for Hbase 
could extend to cover those capabilities. Currently, however, Michael and Jim 
are spending most of their time stabilizing the core of the system and working 
on basic performance bottlenecks, especially as several large scale Hbase 
installations are starting to pop up and file issues.

Here are some of the things that I think would move Hbase in the right 
direction for online serving:


 1.  Atomic appends for a single writer (HADOOP-1700): We have to have atomic 
appends for the commit log or durability is not guaranteed. This is a pressing 
issue in any case for any offline processing use case that requires a 100% 
guarantee on durability.
 2.  Real-time master failover: Need to make sure there is zero downtime on 
failure of the HDFS master and the Hbase master. Perhaps the Zookeeper project 
will provide the key part of the solution although I don't have much visibility 
into where Zookeeper stands and what its roadmap looks like. Can anyone say 
anything more?
 3.  More performance work: Michael did some performance measurements a while 
back that seemed to indicate a lot of time spent back-and-forth in RPC. We're 
exploring Thrift as a lighter-weight RPC mechanism, but there are probably 
other things to be done to reduce this cost. More analysis and measurement 
would be helpful.
 4.  Tighter integration between HDFS and Hbase: Preference for running the 
region server on the same node as one of the replicas of the underlying tables 
would lower latency.
 5.  Memory caching: Instead of pinning a whole Hbase table in RAM, I'd 
recommend the use of memcached in front of Hbase to provide cached read access.

Once these things are in place, Hbase could provide a reasonably performant 
large-scale online serving system. The main advantages of such a system would 
be its flexible schema, automatic repartitioning, and centralized 
administration, especially when compared with a system based around many 
separate MySQL instances with memcached in front of them. It would not have 
full ACID properties but there are many interesting applications that don't 
require strong guarantees in those areas.

Anyone who'd like to start tackling any of the above items should feel free to 
chime in here or jump on the Hbase IRC - more contributors always welcome!

Chad Walters
Search Architect
Powerset

> Date: Fri, 30 Nov 2007 09:50:19 -0800
> Subject: Re: Hbase for dynamic web site?
> From: [EMAIL PROTECTED]
> To: [email protected]
>
>
> Are you already using memcache and related approaches?
>
>
> On 11/30/07 9:46 AM, "Mike Perkowitz"  wrote:
>
>>
>>
>> Hello! We have a web site currently built on linux/apache/mysql/php. Most
>> pages do some mysql queries and then stuff the results into php/html
>> templates. We've been hitting the limits of what our database can handle,
>> and what we can do in realtime for the site. Our plan is to move our data
>> over to Hbase, precomputing as much as we can (some queries we currently do
>> with joins in mysql, for example). Our pages would then be pulling rows from
>> Hbase to stuff into templates.
>>
>>
>>
>> We're still working on getting Hbase working with the amount of data we want
>> to be able to handle, so haven't yet been able to test it for performance.
>> Is anyone else using Hbase in this way, and what has been your experience
>> with realtime performance? I haven't really seen examples of people using
>> Hbase this way - another approach would be for us to use
>> Hadoop/Hbase/mapreduce for computation then put results back into mysql or
>> whatever for realtime access. Any experience or suggestions would be
>> appreciated!
>>
>>
>>
>> Thanks,
>>
>> Mike
>>
>>
>>
>

_________________________________________________________________
Connect and share in new ways with Windows Live.
http://www.windowslive.com/connect.html?ocid=TXT_TAGLM_Wave2_newways_112007

Reply via email to