Dru Jensen wrote:
Stack,
Sorry for the confusion, I am not using the old implementation of
TableReduce. The new 0.19.0 changed this to an interface. The reduce
process is performing calculations. It's not just writing to the
table and requires the sort.
Or try running with even more reducers so loading is spread more evenly?
I will change the region size back and see if that helps. If I find
that I need a larger region, should I change the flush by the same
multiple?
Yes.
St.Ack
thanks,
Dru
On Oct 23, 2008, at 2:18 PM, stack wrote:
Any reason you need to use TableReduce? If you delay the insert into
hbase till reduce-time, it means 1.), the MR framework has spent a
bunch of resources shuffling and sorting your data, a sort that is
going to happen on hbase insert anyways, and 2). your inserts are
going into hbase in order so you pound one region rather than insert
across all. You might try inserting into hbase at the tail of your
map task and output nothing (or something small to keep up the job
counters).
Are your rows > 256MB? At the moment at least, there needs to be a
bit of balance maintained between flushing, compacting and
splitting. The defaults do that. I'm not sure what happens when you
double the max filesize but not correspondingly the flushsize. You
might trying restoring the default (hbase will not try and split a
row if its > configured maxfile size).
St.Ack