Dear developers,

is there any architectural reason while an IndexWriter could not
delete a document?

I understand that the IndexReader (besides its strange naming for this
feature) is the right class to use to delete a document, but this
raises a huge problem for me.

We add almost 50.000 documents a day, while deleting a similar amount
of old documents over the same period.
We index new documents in batch every 5 minutes while deleting the old
ones and optimize the index twice a day, in order to keep good
performance for the queries and the number of index files under
control.

In this situation, I try to keep the same IndexWriter open as much as
possible, in order to avoid any unnecessary fragmentation of the
index.
Before indexing any document, I can check to see if the document has
already been inserted, but I am not able to delete it without closing
the IndexWriter, opening an IndexReader, deleting the document,
closing the IndexReader an opening again the IndexWritere.

This arrangement seems reasonable if updated documents are scarce, but
doesn't seem feasible to work with a high rate of updated documents.

I would prefer to avoid deleting all updated documents from the index
before opening the IndexWriter because the updating and indexing
procedure would get much more complex, and because I will introduce a
significant time gap where a previously available document is no more
available on the index.

Do you confirm my idea that keeping and IndexWriter open as much as
possible while indexing batch of documents is a "good thing"?

Is there any option to ever see a deleteDocument method in the
IndexWriter class, or should I start planning how to handle the update
of documents in another way?

Thank you very much for your attention.

Regards,

Giulio Cesare Solaroli

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to