I took a look at it again and noticed that the mappers fail when data is not local. Also, the number of failures of this type is always equal to the number of rack-local maps.
So, it seems that when data is not local my mapper fails. It makes sense that we only notice it when we have concurrent jobs running. Any hints to share? On Tuesday 03 January 2012 22:55:08 Markus Jelsma wrote: > Hi, > > On our 0.20.205.0 test cluster we sometimes see tasks failing for no clear > reason. The task tracker logs show us: > > 2012-01-03 21:16:27,256 WARN org.apache.hadoop.mapred.TaskLog: Failed to > retrieve stdout log for task: attempt_201201031651_0008_m_000233_0 > java.io.FileNotFoundException: > /opt/hadoop/hadoop-0.20.205.0/libexec/../logs/userlogs/job_201201031651_000 > 8/attempt_201201031651_0008_m_000233_0/log.index (No such file or > directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.<init>(FileInputStream.java:120) > at > org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:102) > at > org.apache.hadoop.mapred.TaskLog.getAllLogsFileDetails(TaskLog.java:187) > at org.apache.hadoop.mapred.TaskLog$Reader.<init>(TaskLog.java:422) > at > org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:81 > ) at > org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:296) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandle > r.java:1221) at > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.ja > va:835) at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandle > r.java:1212) at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCol > lection.java:230) at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnecti > on.java:928) > > However, if we inspect the log more closely we actually see it happening > several times but only one seems to be thrown simultaneously with the task > failing. We see no errors in the datanode's log. > > I've been looking through the configuration descriptions of 0.20.205.0 but > didn't find a setting that could be fix this or is responsible for this. > > Any hints? Upgrade to 1.0? Patches? > Thanks -- Markus Jelsma - CTO - Openindex