But reducer can do some preparations during map process. It can
distribute map output across nodes that will work as reducers.

Copying and sorting map output is also time costuming process (maybe,
more consuming than reduce itself). For example, piece job run log on 40node cluster
could be like that:

09/08/27 11:08:24 INFO job.JobRunningListener:  map 36% reduce 10%
09/08/27 11:08:28 INFO job.JobRunningListener:  map 37% reduce 10%
09/08/27 11:08:29 INFO job.JobRunningListener:  map 37% reduce 11%

But if you run job on single node cluster reduce will start only after map finished.

On Aug 27, 2009, at 4:31 PM, Harish Mallipeddi wrote:

On Thu, Aug 27, 2009 at 5:22 PM, Rakhi Khatwani <[email protected]> wrote:


but i want my reduce to run , tht is if 25% map is done, thn i want the reduce 2 save that much data. even if the 2nd map fails, i dont loose data.
any pointers?
Regards,
Raakhi


What you're asking for will break the semantics of reduce(). Reduce can only
proceed after receiving all the map-outputs.

--
Harish Mallipeddi
http://blog.poundbang.in

---
Vladimir Klimontovich,
skype: klimontovich
GoogleTalk/Jabber: [email protected]
Cell phone: +7926 890 2349

Reply via email to