[ 
https://issues.apache.org/jira/browse/HADOOP-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484570
 ] 

Tom White commented on HADOOP-1159:
-----------------------------------

By catching Exception, InterruptedException is also caught so it can no longer 
be interrupted as descibed in the javadoc. Looks like this needs remedying. 
Could you supply another patch please?

Also, this looks different to HADOOP-1123 - so both patches need applying, I 
think. (Please correct me if I'm wrong.)

> Reducers hang when map output file has a checksum error
> -------------------------------------------------------
>
>                 Key: HADOOP-1159
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1159
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Nigel Daley
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.3
>
>         Attachments: h1159.patch
>
>
> Two reduces hung in our sort benchmark. They always fail to get map outputs 
> from node X due to checksum error when the map outputs are read at that node 
> resulting in a NullPointerException on node X. This leads to constant 
> failures on the two fetching reduces.
> 2007-03-26 00:02:57,082 WARN org.apache.hadoop.fs.FileSystem: Moving bad file 
> /e/c/k/hqa/tb/tmp/mapred/local2/task_0002_m_022488_0/file.out to 
> /e/c/bad_files/file.out.542279301
> 2007-03-26 00:02:57,083 INFO org.apache.hadoop.fs.FSInputChecker: Found 
> checksum error: org.apache.hadoop.fs.ChecksumException: Checksum error: 
> /e/c/k/hqa/tb/tmp/mapred/local2/task_0002_m_022488_0/file.out at 106484224
>       at 
> org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.verifySum(ChecksumFileSystem.java:254)
>       at 
> org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(ChecksumFileSystem.java:211)
>       at 
> org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(ChecksumFileSystem.java:167)
>       at 
> org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:41)
>       at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>       at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
>       at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>       at java.io.DataInputStream.read(DataInputStream.java:132)
>       at 
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:1659)
>       at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
>       at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
>       at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
>       at 
> org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
>       at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
>       at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
>       at 
> org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
>       at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
>       at org.mortbay.http.HttpServer.service(HttpServer.java:954)
>       at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
>       at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
>       at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
>       at 
> org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
>       at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
>       at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
> 2007-03-26 00:02:57,083 WARN /: 
> /mapOutput?map=task_0002_m_022488_0&reduce=1542: 
> java.lang.NullPointerException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to