Hi all,
Has anybody else seen the java.lang.ArrayIndexOutOfBoundsException
error displayed in Diagnostic Text column of the jobdetail.jsp page
when running 0.8?
This occasionally seems to happen during the invert links phase. The
stack crawl looks like:
java.lang.ArrayIndexOutOfBoundsException at
java.util.zip.CRC32.update(CRC32.java:43) at
org.apache.nutch.fs.NFSDataInputStream$Checker.read(NFSDataInputStream.java:92)
at
org.apache.nutch.fs.NFSDataInputStream$PositionCache.read(NFSDataInputStream.java:156)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at
java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at
java.io.BufferedInputStream.read(BufferedInputStream.java:313) at
java.io.DataInputStream.readFully(DataInputStream.java:176) at
org.apache.nutch.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:55)
at
org.apache.nutch.io.DataOutputBuffer.write(DataOutputBuffer.java:89)
at
org.apache.nutch.io.SequenceFile$Reader.next(SequenceFile.java:378)
at
org.apache.nutch.io.SequenceFile$Reader.next(SequenceFile.java:301)
at
org.apache.nutch.io.SequenceFile$Reader.next(SequenceFile.java:323)
at
org.apache.nutch.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:60)
at org.apache.nutch.mapred.MapTask$2.next(MapTask.java:106) at
org.apache.nutch.mapred.MapRunner.run(MapRunner.java:48) at
org.apache.nutch.mapred.MapTask.run(MapTask.java:116) at
org.apache.nutch.mapred.TaskTracker$Child.main(TaskTracker.java:603)
For our most recent trial, I see this 15 times out of 4840 map
attempts (along with 25 socket timeout errors, thus 4800 actual maps
completed).
I see that Rod Taylor reported an error from the same general
location (http://issues.apache.org/jira/browse/NUTCH-170), but his
reported stack had one additional entry:
org.apache.nutch.segment.SegmentReader$InputFormat$1.next(SegmentReader.java:80)
Between the MapTask$2.next and the SequenceFileRecordReader.next calls.
Seems like there might be a bug hiding in this area of the code. I'm
going to wrap some extra debugging around it to get more info when an
error does occur.
Thanks,
-- Ken
--