One immediate optimization would be to only close the writer and open the reader if the document is present. You can have a reader open and do searches while indexing (and optimization) are underway. It's just the delete operation that requires you to close the writer (so you don't have two different objects trying the update the same index).

However, that is rendered moot by the much bigger optimization. Open the reader once, do all your deletes, close the reader, and then do all your adds. I.e., batch your updates as much as possible.

If you have to them one at a time, then the first optimization should help some.

Chuck

Jayakumar.V wrote:

Hi,

Maybe this query has been answered before. My first email to this user group
did not generate any response. I had forwarded it to the following email ids
:


[EMAIL PROTECTED]

java-user@lucene.apache.org



This is my second email to this mail id. Hope I've reached the right place.



We are indexing documents on a scheduled basis. A document which was indexed
at time T1 will be available again for indexing at time T2 with certain
additional fields. Now, I need to ensure that only the document received at
time T2 is present in the index, for which I need to first identify if the
record is present in the index & then delete it before indexing the same.
I've taken the cue from a code snippet available in the TSS case study in
the book Lucene In Action.




The steps I've followed is as below :

-          prepare the Document for indexing

-          close the existing  IndexWriter instance

-          get an IndexReader instance to the index

- check if the record going to be indexed is already available in
the index


-          if YES, delete it & close the IndexReader instance

-          open the IndexWriter instance again

-          add the Document to the index



Now, this is an iterative process for each record being indexed. Is it the
right way to go about doing this? It took nearly 3 hours to index 250,000
records.



I'm attaching the code snippet used in my app. for deleting & adding the
record :



   private void addIndex(Document doc, Map dataMap) {

       IndexReader indexReader = null;



       // check if the doc. is already indexed.

       // if YES, first remove it b4 adding the document

       try {

           // first, close the undelying IndexWriter instance

           // v can't have 2 index modifying instances open at the same
time

           closeWriterIndex();



           // get an IndexReader instance

           indexReader = IndexReader.open(fsDir);

           // get a Term obj. for deletion

           Term term = new Term("xpin",(String)dataMap.get("xpin"));

           // now, remove the already added doc.

           indexReader.delete(term);

     } catch (IOException e) {

           e.printStackTrace();

     } finally {

         try {

             // close the reader instance after deleting the doc.

             indexReader.close();

         } catch (IOException e) {

           e.printStackTrace();

         }

     }



     try {

           // now, reopen the index writer object

           openWriterIndex();



           // index the document

           fsWriter.addDocument(doc);

       } catch (IOException e) {

           e.printStackTrace();

       }

   }



   private void closeWriterIndex() {

     try {

           fsWriter.close();

     } catch (IOException e) {

           e.printStackTrace();

     }

   }



   private void openWriterIndex() {

     try {

           fsWriter = new IndexWriter(fsDir, new StandardAnalyzer(),
false);

           fsWriter.mergeFactor = 100;

     } catch (IOException e) {

           e.printStackTrace();

}

   }



I'm at the final stages of deploying this module. Any suggestions / ideas
will be helpful in completing it fast.






TIA

Jayakumar.V








--
*Chuck Williams*
All Things Local
Founder and CEO
V: (415)464-1889
C: (415)846-9018
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
AIM: hawimanawiz

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to