Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

robert engels Thu, 06 Jul 2006 15:08:40 -0700

I think I finally see how this is supposed to optimize - basicallybecause it remember the terms, and then does the batch deletions.

We avoid all of this messiness by just making sure each document hasa primary key and we always remove/update by primary key and we cankeep the operations in an ordered list (actually set since the keysare unique, and that way multiple updates to the same document in abatch can be coalesced).


I guess still don't see why the change is so involved though...

I would just maintain an ordered list of operations (deletes an adds)on the "buffered writer".

When the "buffered" writer is closed:
Create a RamDirectory.
Perform all deletions in a batch on the main IndexReader.
Perform ordered deletes and adds on the RamDirectory.
Merge the RamDirectory with the main index.

This could all be encapsulated in a BufferedIndexWriter class.


On Jul 6, 2006, at 4:34 PM, robert engels wrote:

I guess I don't see the difference...
You need the write lock to use the indexWriter, and you also needthe write lock to perform a deletion, so if you just get the writelock you can perform the deletion and the add, then close the writer.
I have asked how this submission optimizes anything, and I stillcan't seem to get an answer?
On Jul 6, 2006, at 4:27 PM, Otis Gospodnetic wrote:
I think that patch is for a different scenario, the one where youcan't wait to batch deletes and adds, and want/need to executethem more frequently and in order they really are happening,without grouping them.
Otis

----- Original Message ----
From: robert engels <[EMAIL PROTECTED]>
To: [email protected]
Sent: Thursday, July 6, 2006 3:24:13 PM
Subject: Re: [jira] Commented: (LUCENE-565) SupportingdeleteDocuments in IndexWriter (Code and Performance ResultsProvided)
I guess we just chose a much simpler way to do this...

Even with you code changes, to see the modification made using the
IndexWriter, it must be closed, and a new IndexReader opened.

So a far simpler way is to get the collection of updates first, then

using opened indexreader,
for each doc in collection
       delete document using "key"
endfor

open indexwriter
for each doc in collection
       add document
endfor

open indexreader


I don't see how your way is any faster. You must always flush to disk
and open the indexreader to see the changes.



On Jul 6, 2006, at 2:07 PM, Ning Li wrote:
Hi Otis and Robert,

I added an overview of my changes in JIRA. Hope that helps.
Anyway, my test did exercise the small batches, in that in our
incremental updates we delete the documents with the uniqueterm, andthen add the new (which is what I assumed this was improving),and I
saw o appreciable difference.
Robert, could you describe a bit more how your test is set up? Or a
short
code snippet will help me explain.

Without the patch, when inserts and deletes are interleaved in small
batches, the performance can degrade dramatically because the
ramDirectory
is flushed to disk whenever an IndexWriter is closed, causing alot of
small segments to be created on disk, which eventually need to be
merged.

Is this how your test is set up? And, what are the maxBufferedDocs
and the
maxBufferedDeleteTerms in your test? You won't see a performance
improvement
if they are about the same as the small batch size. The patchworks by
internally buffering inserts and deletes into larger batches.

Regards,
Ning
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Reply via email to