[ https://issues.apache.org/jira/browse/HADOOP-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12485337 ]
Tom White commented on HADOOP-1159: ----------------------------------- Catching NPEs is generally considered bad form since it hides the problem. In this case it's not clear what's null. Would it be possible to rewrite this patch to be more explicit and check for the null value and handle it appropriately (doing what the catch block does)? Also, Devaraj said the client only gets the IOException, so is the client patch needed? > Reducers hang when map output file has a checksum error > ------------------------------------------------------- > > Key: HADOOP-1159 > URL: https://issues.apache.org/jira/browse/HADOOP-1159 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.12.2 > Reporter: Nigel Daley > Assigned To: Owen O'Malley > Fix For: 0.12.3 > > Attachments: 1159-merge.patch, 1159.patch, h1159-2.patch, h1159.patch > > > Two reduces hung in our sort benchmark. They always fail to get map outputs > from node X due to checksum error when the map outputs are read at that node > resulting in a NullPointerException on node X. This leads to constant > failures on the two fetching reduces. > 2007-03-26 00:02:57,082 WARN org.apache.hadoop.fs.FileSystem: Moving bad file > /e/c/k/hqa/tb/tmp/mapred/local2/task_0002_m_022488_0/file.out to > /e/c/bad_files/file.out.542279301 > 2007-03-26 00:02:57,083 INFO org.apache.hadoop.fs.FSInputChecker: Found > checksum error: org.apache.hadoop.fs.ChecksumException: Checksum error: > /e/c/k/hqa/tb/tmp/mapred/local2/task_0002_m_022488_0/file.out at 106484224 > at > org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.verifySum(ChecksumFileSystem.java:254) > at > org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(ChecksumFileSystem.java:211) > at > org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(ChecksumFileSystem.java:167) > at > org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:41) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > at java.io.DataInputStream.read(DataInputStream.java:132) > at > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:1659) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) > at > org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) > at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) > at > org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) > at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) > at org.mortbay.http.HttpServer.service(HttpServer.java:954) > at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) > at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) > at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) > at > org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) > at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) > at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) > 2007-03-26 00:02:57,083 WARN /: > /mapOutput?map=task_0002_m_022488_0&reduce=1542: > java.lang.NullPointerException -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.