[ 
https://issues.apache.org/jira/browse/HADOOP-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637696#action_12637696
 ] 

Brian Bockelman commented on HADOOP-4351:
-----------------------------------------

Hey Hairong,

Unfortunately, the admins moved the namenode to a more permanent home and 
clobbered the existing logfiles in the process.  (And I fixed the corrupt 
blocks: I didn't want to live with a problematic file system for too long!)

The code printed out the replicas in the blocksMap (node 10, 145, 117) and the 
corrupt entries in the corruptReplicas vairable (node16).  The existing code 
calculates that 3 - 1 = 2 replicas must be good (this is a mistake as 
corruptReplicas is not a subset of blocksMap); however, when it starts to 
populate the machineSet, it only gets as far as nodes 10 and 145, then throws 
the exception on 117.

You're right - there's possibly an underlying problem here which this patches a 
symptom of.  However, it still is a useful thing to fix: it is rather painful 
to go through filesystem cleanup when fsck dies instantly.

I believe there log message was something like this one:

2008-10-07 05:04:24,021 INFO org.apache.hadoop.dfs.StateChange: BLOCK 
NameSystem.markBlockAsCorrupt: block blk_-5420894356244363410_2169 could not be 
marked as corrupt as it does not exists in blocksMap

I think if a data node reports it has a corrupt block, then it gets added to 
the corrupt map and not the blocksMap.  I took a peek at the underlying issue, 
and wasn't able to make much progress - an expert on FSNamesystem will be 
needed to find the underlying problem.

Brian

PS: I looked at the patch again - silly me leaked several changes I have made 
into this patch; please disregard any changes to files that are NOT 
FSNamesystem.java

> ArrayIndexOutOfBoundsException during fsck
> ------------------------------------------
>
>                 Key: HADOOP-4351
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4351
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.1
>            Reporter: Brian Bockelman
>         Attachments: fsck_hadoop_4351.patch
>
>
> After observing a lot of corrupted blocks, I suddenly started to get a lot of 
> ArrayIndexOutOfBoundsException.
> It appears to be an issue very similar to HADOOP-3649, which is supposed to 
> be fixed in 0.18.1.
> 2008-10-06 08:48:43,241 WARN /: /fsck?path=%2F:
> java.lang.ArrayIndexOutOfBoundsException: 2
>    at 
> org.apache.hadoop.dfs.FSNamesystem.getBlockLocationsInternal(FSNamesystem.java:789)
>    at 
> org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:727)
>    at org.apache.hadoop.dfs.NamenodeFsck.check(NamenodeFsck.java:167)
>    at org.apache.hadoop.dfs.NamenodeFsck.check(NamenodeFsck.java:162)
>    at org.apache.hadoop.dfs.NamenodeFsck.check(NamenodeFsck.java:162)
>    at org.apache.hadoop.dfs.NamenodeFsck.check(NamenodeFsck.java:162)
>    at org.apache.hadoop.dfs.NamenodeFsck.check(NamenodeFsck.java:162)
>    at org.apache.hadoop.dfs.NamenodeFsck.fsck(NamenodeFsck.java:128)
>    at org.apache.hadoop.dfs.FsckServlet.doGet(FsckServlet.java:48)
>    at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
>    at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
>    at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
>    at 
> org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
>    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
>    at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
>    at 
> org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
>    at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
>    at org.mortbay.http.HttpServer.service(HttpServer.java:954)
>    at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
>    at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
>    at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
>    at 
> org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
>    at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
>    at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to