I've seen that the number of related failures is almost always the same as the number of rack-local mappers. Do you see this as well?
On Tuesday 31 January 2012 12:21:44 Marcin Cylke wrote: > Hi > > I've upgraded my hadoop cluster to version 1.0.0. The upgrade process > went relatively smoothly but it rendered the cluster inoperable due to > errors in jobtrackers operation: > > # in job output > Error reading task > outputhttp://hadoop4:50060/tasklog?plaintext=true&attemptid=attempt_2012013 > 11241_0003_m_000004_2&filter=stdout > > # in each of the jobtrackers' logs > WARN org.apache.hadoop.mapred.TaskLog: Failed to retrieve stderr log for > task: attempt_201201311241_0003_r_000000_1 > java.io.FileNotFoundException: > /usr/lib/hadoop-1.0.0/libexec/../logs/userlogs/job_201201311241_0003/attemp > t_201201311241_0003_r_000000_1/log.index > > (No such file or directory) > at java.io.FileInputStream.open(Native Method) > > > These errors seem related to this two problems: > > http://grokbase.com/t/hadoop.apache.org/mapreduce-user/2012/01/error-readin > g-task-output-and-log-filenotfoundexceptions/03mjwctewcnxlgp2jkcrhvsgep4e > > https://issues.apache.org/jira/browse/MAPREDUCE-2846 > > But I've looked into the source code and the fix from MAPREDUCE-2846 is > there. Perhaps there is some other reason? > > Regards > Marcin -- Markus Jelsma - CTO - Openindex