Re: Lucene.Net 2.1 status

Simone Busoli Thu, 13 Sep 2007 14:58:45 -0700

Hi George,

actually I don't know how it works in Java, I'm not a Java developer and I couldn't easily get to develop in Java.
For what I see this might even not have to do with the FAQ entry I sent you (although it looks like it does), I just realized that if I try to optimize an index when there's an IndexSearcher opened on it then it's "less optimized" than it would be if there were no searchers open on it. This means that on the index directory is kept another file of the same size of the main index file, thus - I guess - a duplicate.

I just tested that if I don't keep an open searcher on the index during optimization, instead, the optimization process works just as expected.

Therefore it looks like that in Lucene.Net that issue is not solved. I don't have test code, but a complete application you can find here (http://code.google.com/p/cs2project/). I guess you don't have a lot of time to test it, but I'm pretty sure that I'm right on this fact, since just avoiding to open an IndexSearcher implies that the index is optimized correctly.

Here are the exact steps of my application to reproduce the issue:

1 - Open and IndexSearcher (which, in turn, opens an IndexReader) to use for searches
2 - Open an IndexReader to delete old documents from the index
3 - Close the IndexReader opened in the previous step
4 - Open and IndexWriter to add new documents
5 - Call Optimize() and then close the IndexWriter opened above (notice that the IndexSearcher opened at step 1 is still open here)
6 - Close the IndexSearcher opened at step 1
7 - Create a new IndexSearcher to be able to search through the newly added documents, and to exclude from searches the deleted ones.

If in the process you remove steps 1, 6 and 7 (ie, you never open a searcher), the optimization triggered at step 5 works as expected, otherwise the issue I reported occurs and the index main file is duplicated.

I can send you more details if you care.

Simone

George Aroush wrote:

Hi Simone,
 
Lucene.Net 2.1 is suppose to work just like it's Java version.  If you are
seeing a difference in this behavior, then something is obviously wrong.  My
question to you is this; do you have a C# test code to show this problem
with Lucene.Net?  Can you port it to Java and verify?  If you can't do all
this verification, at least, give us the C# test code and then I might be
able to take it from here.  This will also have the additional benefit of
verifying that your code is not the issue.
 
Regards,
 
-- George

  _____  

From: Simone Busoli [mailto:[EMAIL PROTECTED]] 
Sent: Wednesday, September 12, 2007 8:48 PM
To: [email protected]
Subject: Re: Lucene.Net 2.1 status

Hi George, thanks for the update. I wanted to ask you something about 2.1.
In Java Lucene FAQ one of the entries says:

Why do I have a deletable file (and old segment files remain) after running
optimize?

This is normal behavior on Windows whenever you also have readers
(IndexReaders <http://wiki.apache.org/lucene-java/IndexReaders>  or
IndexSearchers <http://wiki.apache.org/lucene-java/IndexSearchers> ) open
against the index you are optimizing. Lucene tries to remove old segments
files once they have been merged (optimized). However, because Windows does
not allow removing files that are open for reading, Lucene catches an
IOException deleting these files and and then records these pending
deletable files into the "deletable" file. On the next segments merge, which
happens with explicit optimize() or close() calls and also whenever the
IndexWriter <http://wiki.apache.org/lucene-java/IndexWriter>  flushes its
internal RAMDirectory to disk (every IndexWriter
<http://wiki.apache.org/lucene-java/IndexWriter> .DEFAULT_MAX_BUFFERED_DOCS
(default 10) addDocuments), Lucene will try again to delete these files (and
additional ones) and any that still fail will be rewritten to the deletable
file. 

Note that as of 2.1 the deletable file is no longer used. Instead, Lucene
computes which files are no longer referenced by the index and removes them
whenever a writer is created. 

I'm working on Lucene.Net trunk but I still get the
deletable-files-not-deleted behavior under Windows. Is this supposed to be
working instead?

Simone

George Aroush wrote: 

Hi folks,

Lucene.Net 2.1 is stabilizing very well.  Thanks to DIGY who flushed out the

last remaining NUnit failed tests, we are now down to only one test that is

failing: Lucene.Net.Index.TestNorms._TestNorms().

Since Monday, I have been using this version in production with success.  I

like to get feedback from others if you are using it and how it's working

for you.  If results are good, and pending the elimination of

Lucene.Net.Index.TestNorms._TestNorms() I think we are ready to vote on this

release and close it.

As for the next step, I'm going to take a look at Lucene Java 2.2 and see

how big of a job to port it will be.  I will post on it in few days.

Regards,

-- George

Re: Lucene.Net 2.1 status

Reply via email to