Re: Some questions on transactions

Michael McCandless Wed, 12 Sep 2007 03:24:17 -0700

"Simon Wistow" <[EMAIL PROTECTED]> wrote:
> I'm looking at doing a system which is looks something like this - I 
> have an IndexSearcher open with a on-disk index but all writes go to a 
> RAM based IndexWriter. Periodically I do
> 
>       1. Close IndexSearcher
>       2. Open new  IndexWriter in same location
>       3. Use addIndexes with old RAM based IndexWriter
>       4. Close new IndexWriter
>       4a. Inform something that outstanding writes have been committed 
>       5. Close and then open afresh old RAM based IndexWriter
>       6. Open new IndexSearcher
> 
> A few questions which may showcase my ignorance.
> 
> 1. Is this transactional? Is there at any point where the Index can be 
> left in an undefined state? One of the reasons for doing this is that if 
> a crash happens I can know exactly what documents have been 'committed' 
> and which are still in memory and need to be reindexed. Also, since 
> addIndexes is transactional there's should be no chance of the index 
> getting corrupted which was a problem we'd had before.


This should be transactional.

The one small caveat is on hitting an exception during addIndexes you
don't actually know whether all indices were in fact added or not.
It's possible they were all successfully added but that an exception
was then hit during the followon optimize().  Though, if you use
addIndexesNoOptimize I believe hitting an exception will always mean
no indices were added.

> 2. This seems an awful lot like what IndexWriter does anyway - am I just 
> reimplementing the wheel?

I think you could simply open IndexWriter with "autoCommit=false",
instead of using a RAM-based IndexWriter?  Then on closing this
IndexWriter, all changes are committed.  In the meantime, readers will
always see the index at its starting point, nomatter what you are
doing with the writer.  You can also call abort() to close the
IndexWriter & revert to the starting state of the index.

But, one difference with this approach is you'd have to use the writer
to do deletes on these index, and these deletes would not be visible
until the writer commits (is closed).  So if that's a problem you
should stick with your current approach.

> 3. Because the IndexSearcher (or, more appropriately, its underlying 
> IndexReader) can have deletes I need to close() it before merging and 
> reopen it. I also need to reopen it to get all the new documents in the 
> new merged index. Currently I'm doing something roughly like this
> 
> 
> 
>       private void reopenIndex() {
>               index_closed = true;
>               index.close();
>               index = new IndexSearcher(path);
>               index_close = false;
>       }
> 
> 
> and
> 
>       public Results search(String query) {
>               // make sure the index is open
>               while (index_closed) {
>                       sleep(1);
>               }
>               // now do the search
>               ...
>       }
> 
> But I'm convinced there must be a better way than this. A quick 
> first attempt of this
> 
> 
>       private void reopenIndex() {
>       IndexSearcher old_index = index;
>               index = new IndexSearcher(path);
>               old_index.close();
>     }
> 
> throws StaleIndex exceptions.

Is that a StaleReaderException?

The problem here is that your newly opened reader is stale because
when you close your old index, it flushes its deletes and advances the
index.  If you can do transactional deletes with IndexWriter (above)
then this problem should go away.  If you cannot, then I think you'd
have to re-open your new IndexReader once the old IndexReader has
been closed (and flushed its deletes) before trying to do any deletes
with the new IndexReader.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Some questions on transactions

Reply via email to