Job Speed

Josh Ferguson Mon, 26 Jan 2009 23:28:08 -0800

So I have a table with roughly 145,000 records spread across 300files. The total size is about 7MB. Right now I'm running one jobtracker and one task tracker which is a high cpu amazon box (1.7 Gbitsof RAM, ~ 4 cores). I run the following query:


SELECT COUNT(DISTINCT(activities.actor_id)) FROM activities;

And it takes about 35 minutes to finish. One of my problems is that Ican't get my task tracker to process more than one map at a time eventhough it has a higher number of maximum map tasks. But even that isrelatively fast compared to the reduce which takes about 30 minutes byitself. The status of the task is:


reduce > copy (225 of 344 at 0.01 MB/s) >

I really don't understand what is going on during this copy step orwhy it is taking so long. The files are small and they're all insideof amazon's network. Can you guys help me out?


Josh F.

Job Speed

Reply via email to