Re: Open an IndexWriter in parallel with an IndexReader on the same index.

2006-02-22 Thread Nadav Har'El
Chris Hostetter [EMAIL PROTECTED] wrote on 22/02/2006 03:24:58 AM:

 : It would have been nice if someone wrote something like indexModifier,
 : but with a cache, similar to what Yonik suggested above: deletions will
 : not be done immediately, but rather cached and later done in batches.
 : Of course, batched deletions should not remember the term to delete,
 : but rather the matching document numbers at the time of the deletion -
 : because after the addition of the modified document if we search for
 : the term again we'll find two documents.

 That's not a safe sequence of events.  An Add can trigger a segment
merge,
 which cna renumber documents.

I see. Then maybe there's a way to catch this merge and do the deletions
just before it, because...

 As yonik said, you want to queue up the adds/updates, then do a delete
for
 each update in your queue, then do your adds in one batch.  knowing

The problem in this solution is that unlike queuing deletes, queuing
additions requires you to queue the actual document contents. Doing
this in memory might add a large memory pentalty which is undesired
for applications that try to maintain a small memory footprint.

 when/what to delete requies knowing a key for your records -- which
 isnt' a native lucne concept, but it is certainly a general enough one
 that a helper class could be written for this.

I realise that the name of this delete key isn't defined by Lucene,
but I believe that the concept of such a key was officially
sanctioned by Lucene with the deleteDocuments(Term) method (whose
documentation even mentions the unique ID string scenario).
So indeed a helper class of this sort will probably be useful to
more than a few people.

--
Nadav Har'El


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Open an IndexWriter in parallel with an IndexReader on the same index.

2006-02-21 Thread Pierre Luc Dupont
Hi,
 
is it possible to open an IndexWriter and an IndexReader on the same
index, at the same time,
to do deleteTerm and addDocument?
 
Thanks!
Pierre-Luc


Re: Open an IndexWriter in parallel with an IndexReader on the same index.

2006-02-21 Thread Yonik Seeley
On 2/21/06, Pierre Luc Dupont [EMAIL PROTECTED] wrote:
 is it possible to open an IndexWriter and an IndexReader on the same
 index, at the same time,
 to do deleteTerm and addDocument?

No, it's not possible.  You should batch things: do all your
deletions, close the IndexReader, then open an IndexWriter and do all
the addDocument calls.

-Yonik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Open an IndexWriter in parallel with an IndexReader on the same index.

2006-02-21 Thread Pierre Luc Dupont
Ok, thanks.

That is what I was thinking. 

Pierre-Luc 

-Original Message-
From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
Sent: 2006-02-21 10:14
To: java-user@lucene.apache.org
Subject: Re: Open an IndexWriter in parallel with an IndexReader on the
same index.

On 2/21/06, Pierre Luc Dupont [EMAIL PROTECTED] wrote:
 is it possible to open an IndexWriter and an IndexReader on the 
 same index, at the same time, to do deleteTerm and addDocument?

No, it's not possible.  You should batch things: do all your deletions,
close the IndexReader, then open an IndexWriter and do all the
addDocument calls.

-Yonik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Open an IndexWriter in parallel with an IndexReader on the same index.

2006-02-21 Thread Paul . Illingworth





I have a set of classes similar in function to IndexModifier but a little
more advanced. The idea is  to keep the IndexReaders and IndexWriters open
as long as possible only closing them when absolutely necessary. Using the
concurrency package allows for me to have multiple readers and a single
writer. I use a watchdog timer to flush the index if the index is idle for
a while (closes the index flushes unwritten writes on a writer and
unwritten deletes on a reader). The downside to this approach is you lose
all the benefits of any caching that Lucene does of sorted results and of
any filters that are weakly cached on IndexReaders which can have a massive
impact on searching (I sometimes take a several seconds hit after the index
is reopened on the first sorted query).

In addition to this index management code I have a queue onto which I place
new documents/updates/deletes. Every time a document is added the tasks on
the queue are reordered to batch up similar actions. The downside here is
that this is effectively single threaded which arguably affects the
performance. In addition for this to work well (and to prevent to
equivalent of a databases dirty read) I also have to queue up the
querying of the index until all previous updates have been carried out -
not ideal but not causing too significant a problem in my situation.

My experience is that producing something to manage the indexes swapping
between readers and writers is relatively straightforward.  The task of
batching up updates/deletes and the like may be too application specific -
my code relies on my own unique document ids being in the index and so is
quite specific.

Regards

Paul I.




   
 Nadav Har'El
 [EMAIL PROTECTED]  
To 
 21/02/2006 15:35  java-user@lucene.apache.org 
cc 
   
 Please respond to Subject 
 [EMAIL PROTECTED] Re: Open an IndexWriter in parallel 
apache.org with an IndexReader on the same 
   index.  
   
   
   
   
   
   




Yonik Seeley [EMAIL PROTECTED] wrote on 21/02/2006 05:13:52 PM:
 On 2/21/06, Pierre Luc Dupont [EMAIL PROTECTED] wrote:
  is it possible to open an IndexWriter and an IndexReader on the
same
  index, at the same time,
  to do deleteTerm and addDocument?

 No, it's not possible.  You should batch things: do all your
 deletions, close the IndexReader, then open an IndexWriter and do all
 the addDocument calls.

For some applications, the seperation of indexWriter (which can add a
document) and indexReader (which can delete a document) is very
inconvenient.
For example, consider a case where documents are often updated, and we
often need to find and remove the old document and add the new version
of the document. the indexModifier class nicely hides the complexity
from us and allows both addition and deletion, but the documentation
says its performance sucks (when used in the way I just outlined):
imagine 1000 documents being modified, and now we start deleting and
adding each one, one after another.

It would have been nice if someone wrote something like indexModifier,
but with a cache, similar to what Yonik suggested above: deletions will
not be done immediately, but rather cached and later done in batches.
Of course, batched deletions should not remember the term to delete,
but rather the matching document numbers at the time of the deletion -
because after the addition of the modified document if we search for
the term again we'll find two documents.

What about this idea? Does an implementation of something similar
already exist?

--

Nadav Har'El


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]