Re: Making the elephant run

Ben Reed Tue, 23 May 2006 18:00:06 -0700

Sorry, I forgot that attachments don't work for me on the list...I've opened an issue and put the attachments there. Issue #249Improving Map -> Reduce performance..


thanx
ben


On May 23, 2006, at 4:00 PM, Ben Reed wrote:

Actually, these patches are really just to make Hadoop starttrotting. It is still at least an order of magnitude slower than itshould be, but I think these patches are a good start.
I've created two patches for clarity. They are not independent, butcould easily be made so.
The disk-zoom patch is a performance trifecta: less disk IO, lessdisk space, less CPU, and overall a tremendous improvement. Thepatch is based on the following observation: every piece of datafrom a map hits the disk once on the mapper, and 3 (+plus sorting)times on the reducer. Further, the entire input for the reduce stepis sorted together maximizing the sort time. This patch causes:
1) the mapper to sort the relatively small fragments at the mapperwhich causes two hits to the disk, but they are smaller files.2) the reducer copies the map output and may merge (if more than100 outputs are present) with a couple of other outputs at copytime. No sorting is done since the map outputs are sorted.3) the reducer will merge the map outputs on the fly in memory atreduce time.
I'm attaching the performance graph (with just the disk-zoom patch)to show the results. This benchmark uses a random input and nulloutput to remove any DFS performance influences. The cluster of 49machines I was running on had limited disk space, so I was onlyable to run to a certain size on unmodified Hadoop. With the patchwe use 1/3 the amount of disk space.
The second patch allows the task tracker to reuse processes toavoid the over-head of starting the JVM. While JVM startup isrelatively fast, restarting a Task causes disk IO and DFSoperations that have a negative impact on the rest of the system.When a Task finishes, rather than exiting, it reads the next taskto run from stdin. We still isolate the Task runtime fromTaskTracker, but we only pay the startup penalty once.
This second patch also fixes two performance issues not related toJVM reuse. (The reuse just makes the problems glaring.) First, theJobTracker counts all jobs not just the running jobs to decide theload on a tracker. Second, the TaskTracker should really ask for anew Task as soon as one finishes rather than wait the 10 secs.
I've been benchmarking the code alot, but I don't have access to areally good cluster to try the code out on, so please treat it asexperimental. I would love to feedback.
There is another obvious thing to change: ReduceTasks should startafter the first batch of MapTasks complete, so that 1) they havesomething to do, and 2) they are running on the fastest machines.
thanx
ben

Re: Making the elephant run

Reply via email to