[ 
https://issues.apache.org/jira/browse/HADOOP-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707061#action_12707061
 ] 

Jakob Homan commented on HADOOP-5777:
-------------------------------------

Hairong and I determined the issue was caused by a race condition created by 
having lots of nodes with the same storage ID registering at the same time (due 
to being from cloned drives, not something that should normally happen), and 
the ResolutionMonitor not being properly synchronized.  The network location 
for a particular node is reset to UNRESOLVED (empty string, "") before being 
passed to add, which causes the substring to fail.

Since the ResolutionMonitor is now removed, it's not worth fixing it and will 
close as won't fix.

> ResolutionMonitor dies on an exception
> --------------------------------------
>
>                 Key: HADOOP-5777
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5777
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Hairong Kuang
>            Assignee: Jakob Homan
>
> One of our dfs clusters went into an unhealthy state, where many datanodes 
> have non-zero bytes but no rack information. It turned out the 
> ResolutionMonitor thread died on an exception. Here is the stack trace of the 
> exception that caused the problem:
> ERROR org.apache.hadoop.fs.FSNamesystem: 
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
>         at java.lang.String.substring(String.java:1938)
>         at java.lang.String.substring(String.java:1905)
>         at 
> org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
>         at 
> org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
>         at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
>         at 
> org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
>         at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to