I think that was it or close: it now goes through my Reducer code only twice instead of multiple times. I would like it to do it just once, but I can perhaps live with that - after all, writing zip files by myself, outside of hadoop paradigm may be not quite standard. The second concern is - how to control this when executing on Amazon Map Reduce? I could not find a way.
Thanks! Mark On Wed, Jul 29, 2009 at 9:41 AM, Edward Capriolo <[email protected]>wrote: > On Wed, Jul 29, 2009 at 12:58 AM, Mark Kerzner<[email protected]> > wrote: > > Hi, > > I set the number of reducers to 1, and I indeed get only one output > > file, /output/part-00000. > > > > However, in configure() and in close() I do a System.out, and I see that > > these are called three times, not one. > > > > Why does it matter to me? In configure I open a zip file, into which I > write > > the binary parts of my maps, and in close() I close it. I would expect > this > > to be called just once, producing one zip file, but instead it is called > > three (and 2 when running from IDE), so it produces 3 zip files. I have > to > > play games so that the names of the zip files don't collide - and I am > not > > sure if this is stable. > > > > What am I missing in my understanding? > > > > Thank you, > > Mark > > > > You should take a look at all the %speculative%execution properties > <property> > <name>mapred.reduce.tasks.speculative.execution</name> > <value>false</value> > </property> > > The cause multiple copies of the same map/reduce to be executed to try > to deal with slow mappers. In applications like web/ftp fetching or > file/database writing you probably want these off. >
