Hello, Thanks so much for the reply. See inline. On Fri, Jun 25, 2010 at 12:40 AM, Hemanth Yamijala <[email protected]> wrote: > Hi, > >> I've been getting the following error when trying to run a very simple >> MapReduce job. >> Map finishes without problem, but error occurs as soon as it enters >> Reduce phase. >> >> 10/06/24 18:41:00 INFO mapred.JobClient: Task Id : >> attempt_201006241812_0001_r_000000_0, Status : FAILED >> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. >> >> I am running a 5 node cluster and I believe I have all my settings correct: >> >> * ulimit -n 32768 >> * DNS/RDNS configured properly >> * hdfs-site.xml : http://pastebin.com/xuZ17bPM >> * mapred-site.xml : http://pastebin.com/JraVQZcW >> >> The program is very simple - just counts a unique string in a log file. >> See here: http://pastebin.com/5uRG3SFL >> >> When I run, the job fails and I get the following output. >> http://pastebin.com/AhW6StEb >> >> However, runs fine when I do *not* use substring() on the value (see >> map function in code above). >> >> This runs fine and completes successfully: >> String str = val.toString(); >> >> This causes error and fails: >> String str = val.toString().substring(0,10); >> >> Please let me know if you need any further information. >> It would be greatly appreciated if anyone could shed some light on this >> problem. > > It catches attention that changing the code to use a substring is > causing a difference. Assuming it is consistent and not a red herring,
Yes, this has been consistent over the last week. I was running 0.20.1 first and then upgrade to 0.20.2 but results have been exactly the same. > can you look at the counters for the two jobs using the JobTracker web > UI - things like map records, bytes etc and see if there is a > noticeable difference ? Ok, so here is the first job using write.set(value.toString()); having *no* errors: http://pastebin.com/xvy0iGwL And here is the second job using write.set(value.toString().substring(0, 10)); that fails: http://pastebin.com/uGw6yNqv And here is even another where I used a longer, and therefore unique string, by write.set(value.toString().substring(0, 20)); This makes every line unique, similar to first job. Still fails. http://pastebin.com/GdQ1rp8i >Also, are the two programs being run against > the exact same input data ? Yes, exactly the same input: a single csv file with 23K lines. Using a shorter string leads to more like keys and therefore more combining/reducing, but going by the above it seems to fail whether the substring/key is entirely unique (23000 combine output records) or mostly the same (9 combine output records). > > Also, since the cluster size is small, you could also look at the > tasktracker logs on the machines where the maps have run to see if > there are any failures when the reduce attempts start failing. Here is the TT log from the last failed job. I do not see anything besides the shuffle failure, but there may be something I am overlooking or simply do not understand. http://pastebin.com/DKFTyGXg Thanks again! > > Thanks > Hemanth >
