However, that is rendered moot by the much bigger optimization. Open the reader once, do all your deletes, close the reader, and then do all your adds. I.e., batch your updates as much as possible.
If you have to them one at a time, then the first optimization should help some.
Chuck
Jayakumar.V wrote:
Hi,
Maybe this query has been answered before. My first email to this user group
did not generate any response. I had forwarded it to the following email ids
:
[EMAIL PROTECTED]
java-user@lucene.apache.org
This is my second email to this mail id. Hope I've reached the right place.
We are indexing documents on a scheduled basis. A document which was indexed
at time T1 will be available again for indexing at time T2 with certain
additional fields. Now, I need to ensure that only the document received at
time T2 is present in the index, for which I need to first identify if the
record is present in the index & then delete it before indexing the same.
I've taken the cue from a code snippet available in the TSS case study in
the book Lucene In Action.
The steps I've followed is as below :
- prepare the Document for indexing
- close the existing IndexWriter instance
- get an IndexReader instance to the index
- check if the record going to be indexed is already available in
the index
- if YES, delete it & close the IndexReader instance
- open the IndexWriter instance again
- add the Document to the index
Now, this is an iterative process for each record being indexed. Is it the right way to go about doing this? It took nearly 3 hours to index 250,000 records.
I'm attaching the code snippet used in my app. for deleting & adding the record :
private void addIndex(Document doc, Map dataMap) {
IndexReader indexReader = null;
// check if the doc. is already indexed.
// if YES, first remove it b4 adding the document
try {
// first, close the undelying IndexWriter instance
// v can't have 2 index modifying instances open at the same time
closeWriterIndex();
// get an IndexReader instance
indexReader = IndexReader.open(fsDir);
// get a Term obj. for deletion
Term term = new Term("xpin",(String)dataMap.get("xpin"));
// now, remove the already added doc.
indexReader.delete(term);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
// close the reader instance after deleting the doc.
indexReader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
try {
// now, reopen the index writer object
openWriterIndex();
// index the document
fsWriter.addDocument(doc);
} catch (IOException e) {
e.printStackTrace();
}
}
private void closeWriterIndex() {
try {
fsWriter.close();
} catch (IOException e) {
e.printStackTrace();
}
}
private void openWriterIndex() {
try {
fsWriter = new IndexWriter(fsDir, new StandardAnalyzer(), false);
fsWriter.mergeFactor = 100;
} catch (IOException e) {
e.printStackTrace();
}
}
I'm at the final stages of deploying this module. Any suggestions / ideas
will be helpful in completing it fast.
TIA
Jayakumar.V
-- *Chuck Williams* All Things Local Founder and CEO V: (415)464-1889 C: (415)846-9018 [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> AIM: hawimanawiz
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]