RE: hbase architecture question

Doug Meil Sat, 18 Jun 2011 06:37:29 -0700

Hi there-

There is a section in the hbase book on pre-creating regions.
http://hbase.apache.org/book.html#precreate.regions

-----Original Message-----
From: Hiller, Dean x66079 [mailto:dean.hil...@broadridge.com] 
Sent: Friday, June 17, 2011 4:28 PM
To: user@hbase.apache.org
Subject: RE: hbase architecture question

How do you pre-split tables and how big should the splits be?  We will be doing 
a 3 terabyte load into hbase in the near future.  

We have raw files spit out from our Sybase that I can load once into hadoop dfs 
so we can wipe hbase, and reload the data into hbase on every run of our 
prototype

Is anyone else doing prototypes with massive data and need to clean out and 
restore their data to the database like we have to do to prove this works?

Thanks,
Dean

-----Original Message-----
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel 
Cryans
Sent: Wednesday, June 15, 2011 11:42 AM
To: user@hbase.apache.org
Subject: Re: hbase architecture question

Inline.

J-D

On Tue, Jun 14, 2011 at 6:48 PM, Sam Seigal <selek...@yahoo.com> wrote:
> Hi All,
>
> I had some questions about the hbase architecture that I am a little 
> confused about.
>
> After doing reading over the internet / HBase book etc, my 
> understanding of regions is the following ->
>
> When the cluster initially starts up (with no data), the regionservers 
> come online.
>
> When the data starts flowing into HBase, the data starts being written 
> into a single region. Each region is composed of a memstore, 
> storefiles and a WAL.

There's only one WAL per region server and one memstore per family per region. 
See 
http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html

> As memstores get filled up, they are written to storefiles on disk. A 
> major compaction combines these storefiles together into a single 
> store file. Storefiles/memstores exist on a per column family basis.
>
> What I do not understand is the following - the StoreFiles for a 
> particular region/column family are written into HDFS as an HFile. 
> This HFile then exists in HDFS as chunks (usually of 64 MB size) across the 
> HDFS data nodes.
> When the region "splits", what happens to these store files ?

They are all split into two during a compaction after the split by each 
daughter region.

>
> If a region is already split into multiple store files and a 
> compaction runs and a split was supposed to occur, the compaction 
> process will only compact half of the files ?

You can't compact and split at the same time.

> If a region only has a single store file, how is the data split into two ?
> Is half of the data written into a separate file and then the original 
> file truncated ? Isn't this an expensive operation ?

Like I said previously, the daughter regions will split the files into two new 
files each. It uses disk yeah, but you shouldn't have more than a few splits 
per day. That's also one of the reason why we recommend pre-splitting tables 
when doing an initial insert.

>
> Secondly, what does it mean to "move" a region from one regionserver 
> to another. Since a region is composed of a set of HFiles on the HDFS 
> filesystem, and can potentially exist anywhere across the datanode 
> cluster, is any data physically being moved from one machine to 
> another ? or is it just that the regionserver that the region has 
> "moved" to takes control of the file on HDFS that needs to be written to ?

The responsibility is moved.

>
> I guess what I am not being able to fully understand is how is HBase 
> effectively using HDFS when it comes to region distribution. If a 
> regionserver is hosting a hot region whose blocks sit on a certain set 
> of datanodes, how does moving this region to another regionserver help 
> ? The traffic ultimately would land on the same set of datanodes 
> holding the different chunks of the region .. shouldn't Hbase then be 
> doing load balancing based on where the regions are physically located 
> on the datanodes as opposed to where they are *logically* located ?

It's more complicated than that.

When you write to HDFS, the first block always goes to the local datanode (if 
it exists) and the two others will go elsewhere. It means that a single bigish 
store file can be spread on almost all nodes in a cluster. Moving the region 
effectively forces the DFSClient to read from different locations rather than 
just the local one.

Another thing to keep in mind is flushing/compacting which should eventually 
move the data local to the new region location.

Then maybe it's not even the IO that's causing issue, maybe it's because the 
region is trashing the block cache or requires a lot of CPU for different 
reasons or is being written to a lot and keeps flushing and compacting.

Even then, it's not ideal. Load balancing is still very dumb in HBase and 
there's a lot more work to do. The kind of work I'd like to see a few phd 
students picking up :)

>
> Please let me know if I am missing something truly fundamental here.

Nothing fundamental, but a lot of the answers you were seeking can be found in 
the code. It's actually quite easy to read.

> Sam
>
This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please notify us immediately by e-mail and delete the message and any 
attachments from your system.

RE: hbase architecture question

Reply via email to