[ 
https://issues.apache.org/jira/browse/GIRAPH-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560018#comment-13560018
 ] 

Eli Reisman commented on GIRAPH-488:
------------------------------------

I think the index to re-insert the local splits (hash code for split, worker 
id, thread id together MOD list size) is being modded with the full list size 
(splits + local splits) rather than the temporary smaller list size (just 
non-local splits) that we are attempting to re-insert the local splits into. 
This way we get an insertion index into the list that is occasionally bigger 
than the non-local splits list size.

In Sebastian's case, we have a full list of 225 splits, but 15 were removed and 
re-ordered as local to that thread, and as bad luck would have it, the hashed 
index to begin iterating from (and to re-inject the local splits at for that 
thread) was modded against the ORIGINAL 225 size of the split list, causing 
problems because we need to re-insert the 15 local splits into the "start 
index" within the 205 sized nonlocal list to re-form the full 225 size split 
list. This will mean modding the hashed index by the SMALL list size, and 
saving THAT index value as the "start offset" for the thread to attempt to 
claim splits from (starting from where the local splits were re-injected into 
the list!) This is a tricky problem because it will only crop us as an error 
when the hashed index we arrive at happens to be longer than the short list 
(which will only happen on some workers in a given job, maybe just one, maybe 
not at all)

                
> ArrayOutOfBoundsException in org.apache.giraph.worker.InputSplitPathOrganizer
> -----------------------------------------------------------------------------
>
>                 Key: GIRAPH-488
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-488
>             Project: Giraph
>          Issue Type: Bug
>    Affects Versions: 0.2.0
>            Reporter: Sebastian Schelter
>
> I ran into a strange exception when testing the RandomWalkVertex on a cluster 
> of 26 machines running Hadoop 1.0.4
> {noformat}
> java.lang.IllegalStateException: run: Caught an unrecoverable exception 
> Index: 225, Size: 205
>       at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:735)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>       at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 225, Size: 205
>       at java.util.ArrayList.rangeCheckForAdd(ArrayList.java:612)
>       at java.util.ArrayList.addAll(ArrayList.java:554)
>       at 
> org.apache.giraph.worker.InputSplitPathOrganizer.prioritizeLocalInputSplits(InputSplitPathOrganizer.java:140)
>       at 
> org.apache.giraph.worker.InputSplitPathOrganizer.<init>(InputSplitPathOrganizer.java:93)
>       at 
> org.apache.giraph.worker.InputSplitsCallable.<init>(InputSplitsCallable.java:140)
>       at 
> org.apache.giraph.worker.VertexInputSplitsCallable.<init>(VertexInputSplitsCallable.java:97)
>       at 
> org.apache.giraph.worker.VertexInputSplitsCallableFactory.newCallable(VertexInputSplitsCallableFactory.java:86)
>       at 
> org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:266)
>       at 
> org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:310)
>       at 
> org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:483)
>       at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:525)
>       at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:723)
>       ... 7 more
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to