On Sat, 29 Sep 2007, Michael McCandless wrote:
The new PyLucene is built with a code generator and all public APIs and
classes are made available to Python. SerialMergeScheduler is available.
Wild! Does this mean PyLucene will track tightly to Lucene releases
going forward?
Yes, even more tightly than before since I don't have to patch the Lucene
sources anymore.
What happened prior to this first optimize call? Did you just create
the writer, switch to SerialMergeScheduler, add N docs, then call
setInfoStream(...) and writer.optimize()?
Yes, that's almost exactly it. I create the writer new (with create=true) then
close it and its directory. Then reopen it and add N docs.
The debug output starts with an optimize() call, which first flushes
372 docs to segment _7f; this is the first segment in the index. Had
you opened this writer with create=true?
I open the writer with true when the app creates its initial repository.
Then the writer is added to and oped without create=true.
This optimize() does nothing because the index has only one segment
(_7f) in compound file format, so it's already optimized. Then the
writer is closed.
Then this is printed:
<DBRepositoryView: Lucene (1)> indexed 191 items in 0:00:00.413600
Which is odd because 191 != 372. Can't explain that difference...
That's because an item can have several attributes that get indexed, each
becoming a Lucene document (an item is a Chandler object).
Then another index writer is opened, 5 docs are added, then optimize()
is called, which flushes 5 docs to segment _7g and converts it to
compound file format.
Finally we try to merge _7f and _7g for optimize, and we hit the EOF
exception trying to read the term vector for a doc from one of these
two segments.
Ok, this could explain why the test is passing. In the test I only do one
batch of indexing, not several like here. I missed that difference. My
apologies. I'm going to change my test now and report back...
Thank you for the explanations.
Andi..
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]