On Fri, 28 Sep 2007, Michael McCandless wrote:
I tried all morning to isolate the problem but I seem to be unable
to reproduce it in a simple unit test. In my application, I've been
able to get errors by doing even less: just creating a FSDirectory
and adding documents with fields with term vectors fails when
optimizing the index with the error below. I even tried to add the
same documents, in the same order, in the unit test but to no
avail. It just works.
Are you trying your unit test first in Python (using PyLucene)?
No, I wrote it in Java to begin with. But it's a good idea. I should try it
from PyLucene too.
What is different about my environment ? Well, I'm running PyLucene,
but the new one, the one using a Apple's Java VM, the same VM I'm
using to run the unit test. And I'm not doing anything special like
calling back into Python or something, I'm just calling regular
Lucene APIs adding documents into an IndexWriter on an FSDirectory
using a StandardAnalyzer. If I stop using term vectors, all is
working fine.
Are your documents irregular wrt term vectors? (Ie some docs have
none, others store the terms but not positions/offsets, etc?). Any
interesting changes to Lucene's defaults (autoCommit=false, etc)?
All default config values, no config changes. All documents follow the same
pattern of having five fields, one with term vectors.
I'd like to get to the bottom of this but could use some help. Does
the stacktrace below ring a bell ? Is there a way to run the whole
indexing and optimizing in one single thread ?
You can easily turn off the concurrent (background) merges by doing
this:
writer.setMergeScheduler(new SerialMergeScheduler())
though that probably isn't punched through to Python in PyLucene. You
can also build a Lucene JAR w/ a small change to IndexWriter.java to
do the same thing.
The new PyLucene is built with a code generator and all public APIs and
classes are made available to Python. SerialMergeScheduler is available.
That stacktrace is happening while merging term vectors during an
optimize. It's specifically occuring when loading the term vectors
for a given doc X; we read a position from the index stream (tvx) just
fine, but then when we try to read the first vInt from the document
stream (tvd) we hit the EOF exception. So that position was too large
or the tvd file was somehow truncated. Weird.
Can you call "writer.setInfoStream(System.out)" and get the error to
occur and then post the resulting log? It may shed some light
here....
I called writer.setMergeScheduler(SerialMergeScheduler()) just after creating
the writer and called writer.setInfoStream(System.out) just before calling
optimize(). Below is what I get:
IW [Thread-0]: flush: segment=_7f docStoreSegment=_7f docStoreOffset=0
flushDocs=true flushDeletes=false flushDocStores=true numDocs=372
numBufDelTerms=0
IW [Thread-0]: index before flush
closeDocStore: 5 files to flush to segment _7f
flush postings as segment _7f numDocs=372
oldRAMSize=75040 newFlushedSize=21018 docs/MB=18,558.867 new/old=28.009%
IW [Thread-0]: checkpoint: wrote segments file "segments_9j"
Deleter [Thread-0]: now checkpoint "segments_9j" [1 segments ; isCommit =
true]
Deleter [Thread-0]: IncRef "_7f.fnm": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7f.frq": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7f.prx": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7f.tis": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7f.tii": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7f.nrm": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7f.tvx": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7f.tvf": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7f.tvd": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7f.fdx": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7f.fdt": pre-incr count is 0
Deleter [Thread-0]: deleteCommits: now remove commit "segments_9i"
Deleter [Thread-0]: DecRef "segments_9i": pre-decr count is 1
Deleter [Thread-0]: delete "segments_9i"
IW [Thread-0]: checkpoint: wrote segments file "segments_9k"
Deleter [Thread-0]: now checkpoint "segments_9k" [1 segments ; isCommit =
true]
Deleter [Thread-0]: IncRef "_7f.cfs": pre-incr count is 0
Deleter [Thread-0]: deleteCommits: now remove commit "segments_9j"
Deleter [Thread-0]: DecRef "_7f.fnm": pre-decr count is 1
Deleter [Thread-0]: delete "_7f.fnm"
Deleter [Thread-0]: DecRef "_7f.frq": pre-decr count is 1
Deleter [Thread-0]: delete "_7f.frq"
Deleter [Thread-0]: DecRef "_7f.prx": pre-decr count is 1
Deleter [Thread-0]: delete "_7f.prx"
Deleter [Thread-0]: DecRef "_7f.tis": pre-decr count is 1
Deleter [Thread-0]: delete "_7f.tis"
Deleter [Thread-0]: DecRef "_7f.tii": pre-decr count is 1
Deleter [Thread-0]: delete "_7f.tii"
Deleter [Thread-0]: DecRef "_7f.nrm": pre-decr count is 1
Deleter [Thread-0]: delete "_7f.nrm"
Deleter [Thread-0]: DecRef "_7f.tvx": pre-decr count is 1
Deleter [Thread-0]: delete "_7f.tvx"
Deleter [Thread-0]: DecRef "_7f.tvf": pre-decr count is 1
Deleter [Thread-0]: delete "_7f.tvf"
Deleter [Thread-0]: DecRef "_7f.tvd": pre-decr count is 1
Deleter [Thread-0]: delete "_7f.tvd"
Deleter [Thread-0]: DecRef "_7f.fdx": pre-decr count is 1
Deleter [Thread-0]: delete "_7f.fdx"
Deleter [Thread-0]: DecRef "_7f.fdt": pre-decr count is 1
Deleter [Thread-0]: delete "_7f.fdt"
Deleter [Thread-0]: DecRef "segments_9j": pre-decr count is 1
Deleter [Thread-0]: delete "segments_9j"
IW [Thread-0]: optimize: index now _7f:c372
IW [Thread-0]: flush: segment=null docStoreSegment=null docStoreOffset=0
flushDocs=false flushDeletes=false flushDocStores=false numDocs=0
numBufDelTerms=0
IW [Thread-0]: index before flush _7f:c372
IW [Thread-0]: at close: _7f:c372
<DBRepositoryView: Lucene (1)> indexed 191 items in 0:00:00.413600
IW [Thread-0]: flush: segment=_7g docStoreSegment=_7g docStoreOffset=0
flushDocs=true flushDeletes=false flushDocStores=true numDocs=5
numBufDelTerms=0
IW [Thread-0]: index before flush _7f:c372
closeDocStore: 5 files to flush to segment _7g
flush postings as segment _7g numDocs=5
oldRAMSize=76960 newFlushedSize=437 docs/MB=11,997.437 new/old=0.568%
IW [Thread-0]: checkpoint: wrote segments file "segments_9l"
Deleter [Thread-0]: now checkpoint "segments_9l" [2 segments ; isCommit =
true]
Deleter [Thread-0]: IncRef "_7f.cfs": pre-incr count is 1
Deleter [Thread-0]: IncRef "_7g.fnm": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7g.frq": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7g.prx": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7g.tis": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7g.tii": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7g.nrm": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7g.tvx": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7g.tvf": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7g.tvd": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7g.fdx": pre-incr count is 0
Deleter [Thread-0]: IncRef "_7g.fdt": pre-incr count is 0
Deleter [Thread-0]: deleteCommits: now remove commit "segments_9k"
Deleter [Thread-0]: DecRef "_7f.cfs": pre-decr count is 2
Deleter [Thread-0]: DecRef "segments_9k": pre-decr count is 1
Deleter [Thread-0]: delete "segments_9k"
IW [Thread-0]: checkpoint: wrote segments file "segments_9m"
Deleter [Thread-0]: now checkpoint "segments_9m" [2 segments ; isCommit =
true]
Deleter [Thread-0]: IncRef "_7f.cfs": pre-incr count is 1
Deleter [Thread-0]: IncRef "_7g.cfs": pre-incr count is 0
Deleter [Thread-0]: deleteCommits: now remove commit "segments_9l"
Deleter [Thread-0]: DecRef "_7f.cfs": pre-decr count is 2
Deleter [Thread-0]: DecRef "_7g.fnm": pre-decr count is 1
Deleter [Thread-0]: delete "_7g.fnm"
Deleter [Thread-0]: DecRef "_7g.frq": pre-decr count is 1
Deleter [Thread-0]: delete "_7g.frq"
Deleter [Thread-0]: DecRef "_7g.prx": pre-decr count is 1
Deleter [Thread-0]: delete "_7g.prx"
Deleter [Thread-0]: DecRef "_7g.tis": pre-decr count is 1
Deleter [Thread-0]: delete "_7g.tis"
Deleter [Thread-0]: DecRef "_7g.tii": pre-decr count is 1
Deleter [Thread-0]: delete "_7g.tii"
Deleter [Thread-0]: DecRef "_7g.nrm": pre-decr count is 1
Deleter [Thread-0]: delete "_7g.nrm"
Deleter [Thread-0]: DecRef "_7g.tvx": pre-decr count is 1
Deleter [Thread-0]: delete "_7g.tvx"
Deleter [Thread-0]: DecRef "_7g.tvf": pre-decr count is 1
Deleter [Thread-0]: delete "_7g.tvf"
Deleter [Thread-0]: DecRef "_7g.tvd": pre-decr count is 1
Deleter [Thread-0]: delete "_7g.tvd"
Deleter [Thread-0]: DecRef "_7g.fdx": pre-decr count is 1
Deleter [Thread-0]: delete "_7g.fdx"
Deleter [Thread-0]: DecRef "_7g.fdt": pre-decr count is 1
Deleter [Thread-0]: delete "_7g.fdt"
Deleter [Thread-0]: DecRef "segments_9l": pre-decr count is 1
Deleter [Thread-0]: delete "segments_9l"
IW [Thread-0]: optimize: index now _7f:c372 _7g:c5
IW [Thread-0]: add merge to pendingMerges: _7f:c372 _7g:c5 [optimize] [total 1
pending]
Deleter [Thread-0]: IncRef "_7f.cfs": pre-incr count is 1
Deleter [Thread-0]: IncRef "_7g.cfs": pre-incr count is 1
IW [Thread-0]: now merge
merge=_7f:c372 _7g:c5 into _7h [optimize]
index=_7f:c372 _7g:c5
IW [Thread-0]: merging _7f:c372 _7g:c5 into _7h [optimize]
IW [Thread-0]: merge: total 377 docs
IW [Thread-0]: hit exception during merge; now refresh deleter on segment _7h
Deleter [Thread-0]: refresh [prefix=_7h]: removing newly created unreferenced
file "_7h.fdt"
Deleter [Thread-0]: delete "_7h.fdt"
Deleter [Thread-0]: refresh [prefix=_7h]: removing newly created unreferenced
file "_7h.fdx"
Deleter [Thread-0]: delete "_7h.fdx"
Deleter [Thread-0]: refresh [prefix=_7h]: removing newly created unreferenced
file "_7h.fnm"
Deleter [Thread-0]: delete "_7h.fnm"
Deleter [Thread-0]: refresh [prefix=_7h]: removing newly created unreferenced
file "_7h.frq"
Deleter [Thread-0]: delete "_7h.frq"
Deleter [Thread-0]: refresh [prefix=_7h]: removing newly created unreferenced
file "_7h.nrm"
Deleter [Thread-0]: delete "_7h.nrm"
Deleter [Thread-0]: refresh [prefix=_7h]: removing newly created unreferenced
file "_7h.prx"
Deleter [Thread-0]: delete "_7h.prx"
Deleter [Thread-0]: refresh [prefix=_7h]: removing newly created unreferenced
file "_7h.tii"
Deleter [Thread-0]: delete "_7h.tii"
Deleter [Thread-0]: refresh [prefix=_7h]: removing newly created unreferenced
file "_7h.tis"
Deleter [Thread-0]: delete "_7h.tis"
Deleter [Thread-0]: refresh [prefix=_7h]: removing newly created unreferenced
file "_7h.tvd"
Deleter [Thread-0]: delete "_7h.tvd"
Deleter [Thread-0]: refresh [prefix=_7h]: removing newly created unreferenced
file "_7h.tvf"
Deleter [Thread-0]: delete "_7h.tvf"
Deleter [Thread-0]: refresh [prefix=_7h]: removing newly created unreferenced
file "_7h.tvx"
Deleter [Thread-0]: delete "_7h.tvx"
IW [Thread-0]: hit exception during merge
Deleter [Thread-0]: DecRef "_7f.cfs": pre-decr count is 2
Deleter [Thread-0]: DecRef "_7g.cfs": pre-decr count is 2
java.io.IOException: read past EOF
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:146)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:76)
at
org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:207)
at
org.apache.lucene.index.SegmentReader.getTermFreqVectors(SegmentReader.java:692)
at
org.apache.lucene.index.SegmentMerger.mergeVectors(SegmentMerger.java:279)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:122)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:2898)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2647)
at
org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
at
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1668)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1604)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1571)
IW [Thread-0]: flush: segment=null docStoreSegment=null docStoreOffset=0
flushDocs=false flushDeletes=false flushDocStores=false numDocs=0
numBufDelTerms=0
IW [Thread-0]: index before flush _7f:c372 _7g:c5
IW [Thread-0]: close: wrote segments file "segments_9n"
Deleter [Thread-0]: now checkpoint "segments_9n" [2 segments ; isCommit =
true]
Deleter [Thread-0]: IncRef "_7f.cfs": pre-incr count is 1
Deleter [Thread-0]: IncRef "_7g.cfs": pre-incr count is 1
Deleter [Thread-0]: deleteCommits: now remove commit "segments_9m"
Deleter [Thread-0]: DecRef "_7f.cfs": pre-decr count is 2
Deleter [Thread-0]: DecRef "_7g.cfs": pre-decr count is 2
Deleter [Thread-0]: DecRef "segments_9m": pre-decr count is 1
Deleter [Thread-0]: delete "segments_9m"
IW [Thread-0]: at close: _7f:c372 _7g:c5
Andi..
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]