On Fri, 28 Sep 2007, Andi Vajda wrote:

I found a bug with indexing documents that contain fields with Term Vectors. The indexing fails with 'reading past EOF' errors in what seems the index optimizing phase during addIndexes(). (I index first into a RAMDirectory, then addIndexes() into an FSDIrectory).

I have not filed the bug yet formally as I need to isolate the code. If I turn indexing with term vectors off, indexing completes fine.

I tried all morning to isolate the problem but I seem to be unable to reproduce it in a simple unit test. In my application, I've been able to get errors by doing even less: just creating a FSDirectory and adding documents with fields with term vectors fails when optimizing the index with the error below. I even tried to add the same documents, in the same order, in the unit test but to no avail. It just works.

What is different about my environment ? Well, I'm running PyLucene, but the new one, the one using a Apple's Java VM, the same VM I'm using to run the unit test. And I'm not doing anything special like calling back into Python or something, I'm just calling regular Lucene APIs adding documents into an IndexWriter on an FSDirectory using a StandardAnalyzer. If I stop using term vectors, all is working fine.

I'd like to get to the bottom of this but could use some help. Does the stacktrace below ring a bell ? Is there a way to run the whole indexing and optimizing in one single thread ?

Thanks !

Andi..

Exception in thread "Thread-4" org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: read past EOF at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:263)
Caused by: java.io.IOException: read past EOF
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:146) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
        at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:76)
at org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:207) at org.apache.lucene.index.SegmentReader.getTermFreqVectors(SegmentReader.java:692) at org.apache.lucene.index.SegmentMerger.mergeVectors(SegmentMerger.java:279)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:122)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:2898)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2647)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:232) java.io.IOException: background merge hit exception: _5u:c372 _5v:c5 into _5w [optimize]
        at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1621)
        at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1571)
Caused by: java.io.IOException: read past EOF
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:146) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
        at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:76)
at org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:207) at org.apache.lucene.index.SegmentReader.getTermFreqVectors(SegmentReader.java:692) at org.apache.lucene.index.SegmentMerger.mergeVectors(SegmentMerger.java:279)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:122)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:2898)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2647)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:232)

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to