Found it: Hadoop-5210 "Reduce Task Progress shows > 100% when the total size of map outputs (for a single reducer) is high "
https://issues.apache.org/jira/browse/HADOOP-5210 On Thu, Jul 9, 2009 at 5:42 PM, Peter Skomoroch <[email protected]>wrote: > I've seen this behavior before with reduces going over 100% on big jobs. > What version of Hadoop are you using? I think there are some old bugs filed > for this if you search the Jira. > > > On Thu, Jul 9, 2009 at 5:31 PM, Aaron Kimball <[email protected]> wrote: > >> Reduce tasks which require more than twenty minutes are not a problem. But >> you must emit some data periodically to inform the rest of the system that >> each reducer is still alive. Emitting a (k, v) output pair to the >> collector >> will reset the timer. Similarly, calling Reporter.incrCounter() will also >> reset the clock. So if you're doing a large amount of processing in a loop >> before you emit your final key value pairs, you should periodically >> increment a counter to allow the rest of the system to confirm that you're >> not deadlocked. >> >> I'm not sure why your progress went so high. I know that Hadoop has some >> quirks related to compression. If you've got compressed data, then >> percentages might be inaccurate since the completed/available_input data >> ratio will be partially based on compressed sizes. >> - Aaron >> >> On Thu, Jul 9, 2009 at 12:24 PM, Prashant Ullegaddi < >> [email protected]> wrote: >> >> > Hi Jothi, >> > >> > We are trying to index around 245GB compressed data (~1TB uncompressed) >> > on a 9 node Hadoop cluster with 8 slaves and 1 master. In Map, we are >> > just parsing the files, passing the same to reduce. In Reduce, we are >> > indexing the parsed data in much like Nutch style. >> > >> > When we ran the job, map got over in less than 4hrs. But strange thing >> > happened with reduces. They went past 100% progress (some 200%!). They >> > showed 200+% before getting killed! Is this some kind of bug in Hadoop? >> > >> > All eventually got killed saying "Task >> > attempt_200907091637_0004_r_000000_0 failed to report status for 1201 >> > seconds. Killing!" But I guess indexing in reduce takes more than 1200+ >> > seconds. How to go about it? >> > >> > >> > Thanks in advance, >> > Prashant, >> > Search and Information Extraction Lab, >> > IIIT-Hyderabad, >> > INDIA. >> > >> > >> > > > > -- > Peter N. Skomoroch > 617.285.8348 > http://www.datawrangling.com > http://delicious.com/pskomoroch > http://twitter.com/peteskomoroch > -- Peter N. Skomoroch 617.285.8348 http://www.datawrangling.com http://delicious.com/pskomoroch http://twitter.com/peteskomoroch
