Any reason you need to use TableReduce? If you delay the insert into
hbase till reduce-time, it means 1.), the MR framework has spent a bunch
of resources shuffling and sorting your data, a sort that is going to
happen on hbase insert anyways, and 2). your inserts are going into
hbase in order so you pound one region rather than insert across all.
You might try inserting into hbase at the tail of your map task and
output nothing (or something small to keep up the job counters).
Are your rows > 256MB? At the moment at least, there needs to be a bit
of balance maintained between flushing, compacting and splitting. The
defaults do that. I'm not sure what happens when you double the max
filesize but not correspondingly the flushsize. You might trying
restoring the default (hbase will not try and split a row if its >
configured maxfile size).
St.Ack