[
https://issues.apache.org/jira/browse/GIRAPH-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560018#comment-13560018
]
Eli Reisman commented on GIRAPH-488:
------------------------------------
I think the index to re-insert the local splits (hash code for split, worker
id, thread id together MOD list size) is being modded with the full list size
(splits + local splits) rather than the temporary smaller list size (just
non-local splits) that we are attempting to re-insert the local splits into.
This way we get an insertion index into the list that is occasionally bigger
than the non-local splits list size.
In Sebastian's case, we have a full list of 225 splits, but 15 were removed and
re-ordered as local to that thread, and as bad luck would have it, the hashed
index to begin iterating from (and to re-inject the local splits at for that
thread) was modded against the ORIGINAL 225 size of the split list, causing
problems because we need to re-insert the 15 local splits into the "start
index" within the 205 sized nonlocal list to re-form the full 225 size split
list. This will mean modding the hashed index by the SMALL list size, and
saving THAT index value as the "start offset" for the thread to attempt to
claim splits from (starting from where the local splits were re-injected into
the list!) This is a tricky problem because it will only crop us as an error
when the hashed index we arrive at happens to be longer than the short list
(which will only happen on some workers in a given job, maybe just one, maybe
not at all)
> ArrayOutOfBoundsException in org.apache.giraph.worker.InputSplitPathOrganizer
> -----------------------------------------------------------------------------
>
> Key: GIRAPH-488
> URL: https://issues.apache.org/jira/browse/GIRAPH-488
> Project: Giraph
> Issue Type: Bug
> Affects Versions: 0.2.0
> Reporter: Sebastian Schelter
>
> I ran into a strange exception when testing the RandomWalkVertex on a cluster
> of 26 machines running Hadoop 1.0.4
> {noformat}
> java.lang.IllegalStateException: run: Caught an unrecoverable exception
> Index: 225, Size: 205
> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:735)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 225, Size: 205
> at java.util.ArrayList.rangeCheckForAdd(ArrayList.java:612)
> at java.util.ArrayList.addAll(ArrayList.java:554)
> at
> org.apache.giraph.worker.InputSplitPathOrganizer.prioritizeLocalInputSplits(InputSplitPathOrganizer.java:140)
> at
> org.apache.giraph.worker.InputSplitPathOrganizer.<init>(InputSplitPathOrganizer.java:93)
> at
> org.apache.giraph.worker.InputSplitsCallable.<init>(InputSplitsCallable.java:140)
> at
> org.apache.giraph.worker.VertexInputSplitsCallable.<init>(VertexInputSplitsCallable.java:97)
> at
> org.apache.giraph.worker.VertexInputSplitsCallableFactory.newCallable(VertexInputSplitsCallableFactory.java:86)
> at
> org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:266)
> at
> org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:310)
> at
> org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:483)
> at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:525)
> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:723)
> ... 7 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira