Re: HBase Write to Regionservers behavior

Billy Pearson Thu, 11 Jun 2009 20:48:14 -0700

once the table has split more you might look in to using
org.apache.hadoop.hbase.mapred.HRegionPartitioner.java

It will split up the data and only run one reduce per region so all that'sregions rows will be sent to just one reducerbut does not help much as when the table is small and you have a lot ofreduce task.

It has benefits while one region is done that region will likely be flushedas memcache gets full and has to starts flushingSo it can start compactions and splits with out having to worry about moredata coming.Right now all the reduce will sort the data by key so all the reduce taskwill start writing to the same regions as they go because the data is sortedso they start from the first of the table to the last.


Billy

"Bradford Stephens"<[email protected]> wrote in messagenews:[email protected]...

Hey there,

So, I wiped my HDFS and reinstalled everything, and am running smaller
loads... so far, so good. I've got 7 regionservers.

My job basically takes a lot of documents and metadata with unique
binary keys (like "055E51294F9D9CA331D968D04B72A11C"), combines them
all in a reducer, then writes it to HBase.

What I'm noticing is that it's writing to mostly one or two regions on
one box at a time, even though I have 7 reducers running. Monitoring
everything with dstat -v, I notice that only 2 of my servers are doing
much. These boxes have very low CPU idling, and high disk output (a
few GB a minute).

Everything else has a a little bit of disk activity (maybe 500
MB/minute), but very idle CPUs.

Is this normal behavior? I guess as more data is loaded, more
regionservers are split, so over time, more boxen will be loading
data?

Cheers,
Bradford

Re: HBase Write to Regionservers behavior

Reply via email to