Running a MapReduce job on a single node (as a test) in pseudo-distributed node.
The Map job is a little intense and generates 221 million output records. The Reduce phase is a simple reduce using the TableReduceOutputFormat. It's single threaded during reduce because, as I understand it, TableReduce uses only one reduce per node (or is there some other reason that reduce only spawns one reduce task?). In this configuration, it writes 50-100 or so inserts per second (which doesn't strike me as terrible), but given the low load factor of 0.20 on the machine, I don't understand what's keeping it from performing better. On Tue, Apr 29, 2008 at 12:47 PM, stack <[EMAIL PROTECTED]> wrote: > Daniel Leffel wrote: > >> However, it baffles me a little that it's so slow. I don't quite >> understand >> the bottleneck - load on the machine is 0.20 for a pure load. >> Nevertheless, >> large batch loads that run a little slow is WAY better to me than crashing >> with OOME. >> > Tell us more about how you are running your upload Daniel? Is it single > client? Single threaded? What size are records? What size cluster (pardon > if you've already said)? > Thanks, > St.Ack >
