RE: Back on this single write thing

Kurt Mackey Tue, 26 Jun 2007 18:30:39 -0700

The answer to A is: basically because a delete is more like a Read than a 
Write, in the context of Lucene. :)  Also, it doesn't really make much sense.


For B: The "recommended" way to handle this is generally like this:

1. Get the batch of documents to update
2. Open a new IndexReader
3. Loop through the documents to delete them
4. Close the IndexReader you were using for deletes
5. Open your IndexWriter
6. Write your documents out
7. Close the IndexWriter
8. Reopen the "main" IndexSearcher so it sees the updated index

I ended up implementing sort of a work queue for Index operations.  My basic 
work units are:

* Index - adds a document to be added/updated
* Commit - commits anything to be written, using the steps above
* Optimize - optimizes the index

One issue I had with the above, though, was that I might end up adding new 
versions of the same document a few times before the commit.  This resulted in 
duplicate entries, since the delete and write were disparate operations.  
Cramming everything into a hashtable made for an easy enough fix, but I hadn't 
actually thought about it before seeing duplicates. :)

-Kurt

-----Original Message-----
From: Patrick Burrows [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 26, 2007 7:48 PM
To: [email protected]
Subject: Back on this single write thing

I created a singleton IndexWriter, pasted below in case anyone else wants
it [1]. But now I have a bit of a problem. Someone mentioned that I can't
have my index readers delete either. Makes sense, since that is a write
operation.

I just realized that one of the processes I am moving to use the new
singleton stuff is a "Refresh()" method. It loops through each document,
deletes it (using an indexreader) and then immediately recreates it (using
an indexwriter).

A -- why aren't these methods (delete and add) part of the same class?

B -- but, more importantly, (and less wining)... how do you handle this?
>From my understanding you can't just update fields in an already indexed
document. You have to delete it and then re-add it. This operation
necessarily involves a Delete and an Add. Any thoughts would be helpful.




[1]

using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using FullTextSearch.Tasks.Properties;
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Index;
using Directory=Lucene.Net.Store.Directory;

namespace FullTextSearch.Tasks
{
    public sealed class IndexWriterSingleton : IndexWriter
    {
        private static readonly IndexWriterSingleton instance =
            new IndexWriterSingleton(Settings.Default.IndexPath, new
StandardAnalyzer(), false);

        static readonly object lockhandle = new object();
        static IndexWriterSingleton(){}

        public static IndexWriterSingleton Instance
        {
            get { return instance; }
        }

        public IndexWriterSingleton(FileInfo path, Analyzer a, bool create)
: base(path, a, create){}
        public IndexWriterSingleton(string path, Analyzer a, bool create) :
base(path, a, create){}
        public IndexWriterSingleton(Directory d, Analyzer a, bool create) :
base(d, a, create){}

        public override void AddDocument(Lucene.Net.Documents.Document doc)
        {
            lock (lockhandle)
            {
                base.AddDocument(doc);
            }
        }

        public override void AddDocument(Lucene.Net.Documents.Document doc,
Analyzer analyzer)
        {
            lock (lockhandle)
            {
                base.AddDocument(doc, analyzer);
            }
        }

        public override void Optimize()
        {
            lock (lockhandle)
            {
                base.Optimize();
            }
        }
    }
}


--
-
P

RE: Back on this single write thing

Reply via email to