On Thursday 23 August 2007, Doug Cutting wrote: > Thorsten Schuett wrote: > > During the copy phase of reduce, the cpu load was very low and vmstat > > showed constant reads from the disk at ~15MB/s and bursty writes. At the > > same time, data was sent over the loopback device at ~15MB/s. I don't see > > what else could limit the performance here. The disk can certainly > > provide the data at higher speeds. > > It can if the reads are sequential, but might not if they're random. > That said, there could well be a Hadoop bottleneck here, but I still > doubt that it is the loopback device, which is surely capable of greater > than 15MB/s, no? To me it looks like as if the copy operation reduces/limits my reduce performance. But we can probably agree that it is not a good idea to copy files around when running in a single node, especially when using http for copying.
Thorsten
