Hi Jothi,

We are trying to index around 245GB compressed data (~1TB uncompressed)
on a 9 node Hadoop cluster with 8 slaves and 1 master. In Map, we are
just parsing the files, passing the same to reduce. In Reduce, we are
indexing the parsed data in much like Nutch style.

When we ran the job, map got over in less than 4hrs. But strange thing
happened with reduces. They went past 100% progress (some 200%!). They
showed 200+% before getting killed! Is this some kind of bug in Hadoop?

All eventually got killed saying "Task
attempt_200907091637_0004_r_000000_0 failed to report status for 1201
seconds. Killing!" But I guess indexing in reduce takes more than 1200+
seconds. How to go about it?


Thanks in advance,
Prashant,
Search and Information Extraction Lab,
IIIT-Hyderabad,
INDIA.

Reply via email to