Hi all,

I would like to submit a "think different" approach to this problem
for evaluation for you developers.

Would it be possible to just mark the relevant documents as "deleted"
(instead of deleting them altogether) with an IndexWriter used for
inserting new documents?

"marking" a document as deleted would leave it on the index, but it
would not include it in any result set.

At a later time, an IndexReader could be opened to really delete all
"marked" documents.

Does this approach is compatible with Lucene architecture?

Regards,

Giulio Cesare



On Mon, 19 Jul 2004 20:44:26 -0700, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Dmitry Serebrennikov wrote:
> > Doug Cutting wrote:
> >
> >> Dmitry Serebrennikov wrote:
> >>
> >>> So here's a modified sequence of operations, perhaps a bit more
> >>> efficient than proposed by Christoph:
> >>> 1) Open an IndexReader for searching - S. Keep it open until the
> >>> transaction is committed.
> >>> 2) Open a second IndexReader for deletions - D.
> >>> 3) Create a filter bitset F (or use any other mechanism for storing
> >>> document numbers to be deleted)
> >>> 4) Open an IndexWriter for new documents - W.
> >>> 5) As documents come in, add them using W. Find their old versions in
> >>> D and record their document numbers in F. D will not show any new
> >>> documents, only documents present at the time D was created.
> >>> 6) Close W.
> >>> 7) Use D to delete all documents marked in F.
> >>> 8) Close D.
> >>
> >>
> >>
> >> What happens if there are deletions in S and D, and then, in step 5,
> >> as documents are added to W and segments are merged, documents are
> >> renumbered?  Wouldn't that invalidate F?  Currently we don't permit
> >> one to delete documents from an IndexReader while an IndexWriter is
> >> open, to prevent this sort of thing.  Am I missing something?
> >
> >
> > I was assuming that there would never be deletions in S.
> 
> Then you need to ensure that you leave the index has no deletions, and
> optimize it if it has any, to remove them.  This is probably most safely
> done as the first step, rather than the last.
> 
> I'm not sure this method has many advantages over what Christoph
> orginally suggested in:
> 
> http://www.mail-archive.com/lucene-dev%40jakarta.apache.org/msg06165.html
> 
> 
> 
> Doug
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to