Dru Jensen wrote:
Stack,

Sorry for the confusion, I am not using the old implementation of TableReduce. The new 0.19.0 changed this to an interface. The reduce process is performing calculations. It's not just writing to the table and requires the sort.
Or try running with even more reducers so loading is spread more evenly?

I will change the region size back and see if that helps. If I find that I need a larger region, should I change the flush by the same multiple?
Yes.
St.Ack

thanks,
Dru

On Oct 23, 2008, at 2:18 PM, stack wrote:

Any reason you need to use TableReduce? If you delay the insert into hbase till reduce-time, it means 1.), the MR framework has spent a bunch of resources shuffling and sorting your data, a sort that is going to happen on hbase insert anyways, and 2). your inserts are going into hbase in order so you pound one region rather than insert across all. You might try inserting into hbase at the tail of your map task and output nothing (or something small to keep up the job counters).

Are your rows > 256MB? At the moment at least, there needs to be a bit of balance maintained between flushing, compacting and splitting. The defaults do that. I'm not sure what happens when you double the max filesize but not correspondingly the flushsize. You might trying restoring the default (hbase will not try and split a row if its > configured maxfile size).

St.Ack


Reply via email to