Koji - That looks like it did the trick - we're smooth sailing now. Thanks a lot!
On Mon, Mar 2, 2009 at 2:02 PM, Ryan Shih <[email protected]> wrote: > Koji - That makes a lot of sense. The two tasks are probably stepping over > each other. I'll give it a try and let you know how it goes. > > Malcolm - if you turned off speculative execution and are still getting the > problem, it doesn't sound the same. Do you want to do a cut&paste of your > reduce code and I'll see if I can spot anything suspicious? > > > On Mon, Mar 2, 2009 at 1:15 PM, Malcolm Matalka < > [email protected]> wrote: > >> I have a situation which may be related. I am running hadoop 0.18.1. I >> am on a cluster with 5 machines and testing on very small input of 10 >> lines. Mapper produces either 1 or 0 output per line of input yet >> somehow I get 18 lines of output from the reducer. For example I have >> one input where the key is: >> fd349fc441ff5e726577aeb94cceb1e4 >> >> However, I added a print to the reducer to print keys right before >> calling output.collect and I have 3 instances of this key being printed. >> >> I have turned speculative execution off and still get this. >> >> Does this sound related? A known bug? Something I'm missing? Fixed in >> 19.1? >> >> - Malcolm >> >> >> -----Original Message----- >> From: Koji Noguchi [mailto:[email protected]] >> Sent: Monday, March 02, 2009 15:59 >> To: [email protected] >> Subject: RE: Potential race condition (Hadoop 18.3) >> >> Ryan, >> >> If you're using getOutputPath, try replacing it with getWorkOutputPath. >> >> http://hadoop.apache.org/core/docs/r0.18.3/api/org/apache/hadoop/mapred/ >> FileOutputFormat.html#getWorkOutputPath(org.apache.hadoop.mapred.JobConf<http://hadoop.apache.org/core/docs/r0.18.3/api/org/apache/hadoop/mapred/%0AFileOutputFormat.html#getWorkOutputPath%28org.apache.hadoop.mapred.JobConf> >> ) >> >> Koji >> >> -----Original Message----- >> From: Ryan Shih [mailto:[email protected]] >> Sent: Monday, March 02, 2009 11:01 AM >> To: [email protected] >> Subject: Potential race condition (Hadoop 18.3) >> >> Hi - I'm not sure yet, but I think I might be hitting a race condition >> in >> Hadoop 18.3. What seems to happen is that in the reduce phase, some of >> my >> tasks perform speculative execution but when the initial task completes >> successfully, it sends a kill to the new task started. After all is said >> and >> done, perhaps one in every five or ten which kill their second task ends >> up >> with zero or truncated output. When I code it to turn off speculative >> execution, the problem goes away. Are there known race conditions that I >> should be aware of around this area? >> >> Thanks in advance, >> Ryan >> > >
