Running a MapReduce job on a single node (as a test) in pseudo-distributed
node.

The Map job is a little intense and generates 221 million output records.

The Reduce phase is a simple reduce using the TableReduceOutputFormat.

It's single threaded during reduce because, as I understand it, TableReduce
uses only one reduce per node (or is there some other reason that reduce
only spawns one reduce task?).

In this configuration, it writes 50-100 or so inserts per second (which
doesn't strike me as terrible), but given the low load factor of 0.20 on the
machine, I don't understand what's keeping it from performing better.


On Tue, Apr 29, 2008 at 12:47 PM, stack <[EMAIL PROTECTED]> wrote:

> Daniel Leffel wrote:
>
>> However, it baffles me a little that it's so slow. I don't quite
>> understand
>> the bottleneck - load on the machine is 0.20 for a pure load.
>> Nevertheless,
>> large batch loads that run a little slow is WAY better to me than crashing
>> with OOME.
>>
> Tell us more about how you are running your upload Daniel?  Is it single
> client?  Single threaded?  What size are records?  What size cluster (pardon
> if you've already said)?
> Thanks,
> St.Ack
>

Reply via email to