Hi all,

Has anybody else seen the java.lang.ArrayIndexOutOfBoundsException error displayed in Diagnostic Text column of the jobdetail.jsp page when running 0.8?

This occasionally seems to happen during the invert links phase. The stack crawl looks like:

java.lang.ArrayIndexOutOfBoundsException  at
java.util.zip.CRC32.update(CRC32.java:43) at org.apache.nutch.fs.NFSDataInputStream$Checker.read(NFSDataInputStream.java:92) at org.apache.nutch.fs.NFSDataInputStream$PositionCache.read(NFSDataInputStream.java:156) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:313) at java.io.DataInputStream.readFully(DataInputStream.java:176) at org.apache.nutch.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:55) at org.apache.nutch.io.DataOutputBuffer.write(DataOutputBuffer.java:89) at org.apache.nutch.io.SequenceFile$Reader.next(SequenceFile.java:378) at org.apache.nutch.io.SequenceFile$Reader.next(SequenceFile.java:301) at org.apache.nutch.io.SequenceFile$Reader.next(SequenceFile.java:323) at org.apache.nutch.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:60) at org.apache.nutch.mapred.MapTask$2.next(MapTask.java:106) at org.apache.nutch.mapred.MapRunner.run(MapRunner.java:48) at org.apache.nutch.mapred.MapTask.run(MapTask.java:116) at org.apache.nutch.mapred.TaskTracker$Child.main(TaskTracker.java:603)

For our most recent trial, I see this 15 times out of 4840 map attempts (along with 25 socket timeout errors, thus 4800 actual maps completed).

I see that Rod Taylor reported an error from the same general location (http://issues.apache.org/jira/browse/NUTCH-170), but his reported stack had one additional entry:

org.apache.nutch.segment.SegmentReader$InputFormat$1.next(SegmentReader.java:80)

Between the MapTask$2.next and the SequenceFileRecordReader.next calls.

Seems like there might be a bug hiding in this area of the code. I'm going to wrap some extra debugging around it to get more info when an error does occur.

Thanks,

-- Ken
--

Reply via email to