Stack,
Sorry for the confusion, I am not using the old implementation of
TableReduce. The new 0.19.0 changed this to an interface. The reduce
process is performing calculations. It's not just writing to the
table and requires the sort.
I will change the region size back and see if that helps. If I find
that I need a larger region, should I change the flush by the same
multiple?
thanks,
Dru
On Oct 23, 2008, at 2:18 PM, stack wrote:
Any reason you need to use TableReduce? If you delay the insert
into hbase till reduce-time, it means 1.), the MR framework has
spent a bunch of resources shuffling and sorting your data, a sort
that is going to happen on hbase insert anyways, and 2). your
inserts are going into hbase in order so you pound one region rather
than insert across all. You might try inserting into hbase at the
tail of your map task and output nothing (or something small to keep
up the job counters).
Are your rows > 256MB? At the moment at least, there needs to be a
bit of balance maintained between flushing, compacting and
splitting. The defaults do that. I'm not sure what happens when
you double the max filesize but not correspondingly the flushsize.
You might trying restoring the default (hbase will not try and split
a row if its > configured maxfile size).
St.Ack