RE: Back on this single write thing

Kurt Mackey Tue, 26 Jun 2007 19:24:20 -0700

Yup, between the Java Lucene community/docs and the SOLR sourcecode I've been 
able to find out quite a bit.

At times I'm annoyed that Lucene.NET isn't more ".NET like", but it's generally 
useful that it matches up so well to Lucene.

-Kurt

From: Simone Busoli [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 26, 2007 9:19 PM
To: [email protected]
Subject: Re: Back on this single write thing

As far as I can see there's no .NET documentation, but since Lucene.Net looks 
like a class-per-class port of Lucene, and the indexes are in the same format 
and compatible, too, the docs are very good for Lucene.Net as well.

Simone

Patrick Burrows wrote:
ooh, good resource, hadn't seen the wiki. ...of course, I've been staying
away from all the java versions of lucene anyway for the express purpose of
growing the .net specific documentation... but I'll look there.

On 6/26/07, Simone Busoli <[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]> wrote:

These topics in the Lucene (Java) documentation are discussed (
http://wiki.apache.org/lucene-java/UpdatingAnIndex), you should take a
look there, as well as give a skim at Lucene In Action book. You may take a
look at the IndexModifier class, too.

Simone

Kurt Mackey wrote:

The answer to A is: basically because a delete is more like a Read than a 
Write, in the context of Lucene. :)  Also, it doesn't really make much sense.

For B: The "recommended" way to handle this is generally like this:

1. Get the batch of documents to update
2. Open a new IndexReader
3. Loop through the documents to delete them
4. Close the IndexReader you were using for deletes
5. Open your IndexWriter
6. Write your documents out
7. Close the IndexWriter
8. Reopen the "main" IndexSearcher so it sees the updated index

I ended up implementing sort of a work queue for Index operations.  My basic 
work units are:

* Index - adds a document to be added/updated
* Commit - commits anything to be written, using the steps above
* Optimize - optimizes the index

One issue I had with the above, though, was that I might end up adding new 
versions of the same document a few times before the commit.  This resulted in 
duplicate entries, since the delete and write were disparate operations.  
Cramming everything into a hashtable made for an easy enough fix, but I hadn't 
actually thought about it before seeing duplicates. :)

-Kurt

-----Original Message-----
From: Patrick Burrows [mailto:[EMAIL PROTECTED] <[EMAIL 
PROTECTED]><mailto:[EMAIL PROTECTED]>]
Sent: Tuesday, June 26, 2007 7:48 PM
To: 
[email protected]<mailto:[email protected]>
Subject: Back on this single write thing

I created a singleton IndexWriter, pasted below in case anyone else wants
it [1]. But now I have a bit of a problem. Someone mentioned that I can't
have my index readers delete either. Makes sense, since that is a write
operation.

I just realized that one of the processes I am moving to use the new
singleton stuff is a "Refresh()" method. It loops through each document,
deletes it (using an indexreader) and then immediately recreates it (using
an indexwriter).

A -- why aren't these methods (delete and add) part of the same class?

B -- but, more importantly, (and less wining)... how do you handle this?
>From my understanding you can't just update fields in an already indexed
document. You have to delete it and then re-add it. This operation
necessarily involves a Delete and an Add. Any thoughts would be helpful.

[1]

using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using FullTextSearch.Tasks.Properties;
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Index;
using Directory=Lucene.Net.Store.Directory;

namespace FullTextSearch.Tasks
{
    public sealed class IndexWriterSingleton : IndexWriter
    {
        private static readonly IndexWriterSingleton instance =
            new IndexWriterSingleton(Settings.Default.IndexPath, new
StandardAnalyzer(), false);

        static readonly object lockhandle = new object();
        static IndexWriterSingleton(){}

        public static IndexWriterSingleton Instance
        {
            get { return instance; }
        }

        public IndexWriterSingleton(FileInfo path, Analyzer a, bool create)
: base(path, a, create){}
        public IndexWriterSingleton(string path, Analyzer a, bool create) :
base(path, a, create){}
        public IndexWriterSingleton(Directory d, Analyzer a, bool create) :
base(d, a, create){}

        public override void AddDocument(Lucene.Net.Documents.Document doc)
        {
            lock (lockhandle)
            {
                base.AddDocument(doc);
            }
        }

        public override void AddDocument(Lucene.Net.Documents.Document doc,
Analyzer analyzer)
        {
            lock (lockhandle)
            {
                base.AddDocument(doc, analyzer);
            }
        }

        public override void Optimize()
        {
            lock (lockhandle)
            {
                base.Optimize();
            }
        }
    }
}

--
-
P

RE: Back on this single write thing

Reply via email to