RE: Concurrent searching re-indexing

2005-02-18 Thread Paul Mellor
Ok, I will change my reindex method to delete all documents and then re-add
them all, rather than using an IndexWriter to write a completely new index.

Thanks for the help on this everyone.

Paul

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: 17 February 2005 22:26
To: Lucene Users List
Subject: Re: Concurrent searching  re-indexing


Paul Mellor wrote:
 I've read from various sources on the Internet that it is perfectly safe
to
 simultaneously search a Lucene index that is being updated from another
 Thread, as long as all write access to the index is synchronized.  But
does
 this apply only to updating the index (i.e. deleting and adding
documents),
 or to a complete re-indexing (i.e. create a new IndexWriter with the
 'create' argument true and then re-add all the documents)?
[ ...]
 java.io.IOException: couldn't delete _a.f1
 at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166)
[...]
 This is running on Windows 2000.

On Windows one cannot delete a file while it is still open.  So, no, on 
Windows one cannot remove an index entirely while an IndexReader or 
Searcher is still open on it, since it is simply impossible to remove 
all the files in the index.

We might attempt to patch this by keeping a list of such files and 
attempt to delete them later (as is done when updating an index).  But 
this could cause problems, as a new index will eventually try to use 
these same file names again, and it would then conflict with the open 
IndexReader.  This is not a problem when updating an existing index, 
since filenames (except for a few which are not kept open, like 
segments) are never reused in the lifetime of an index.  So, in order 
for such a fix to work we would need to switch to globally unique 
segment names, e.g., long random strings, rather than increasing integers.

In the meantime, the safe way to rebuild an index from scratch while 
other processes are reading it is simply to delete all of its documents, 
then start adding new ones.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


_
This e-mail has been scanned for viruses by MCI's Internet Managed Scanning
Services - powered by MessageLabs. For further information visit
http://www.mci.com

This e-mail and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you are not the intended recipient, you should not copy, retransmit or
use the e-mail and/or files transmitted with it  and should not disclose
their contents. In such a case, please notify [EMAIL PROTECTED]
and delete the message from your own system. Any opinions expressed in this
e-mail and/or files transmitted with it that do not relate to the official
business of this company are those solely of the author and should not be
interpreted as being endorsed by this company.


RE: Concurrent searching re-indexing

2005-02-17 Thread Paul Mellor
Otis,

Looking at your reply again, I have a couple of questions -

IndexSearcher (IndexReader, really) does take a snapshot of the index state
when it is opened, so at that time the index segments listed in segments
should be in a complete state.  It also reads index files when searching, of
course.

1. If IndexReader takes a snapshot of the index state when opened and then
reads the files when searching, what would happen if the files it takes a
snapshot of are deleted before the search is performed (as would happen with
a reindexing in the period between opening an IndexSearcher and using it to
search)?

2. Does a similar potential problem exist when optimising an index, if this
combines all the segments into a single file?

Many thanks

Paul

-Original Message-
From: Paul Mellor [mailto:[EMAIL PROTECTED]
Sent: 16 February 2005 17:37
To: 'Lucene Users List'
Subject: RE: Concurrent searching  re-indexing


But all write access to the index is synchronized, so that although multiple
threads are creating an IndexWriter for the same directory and using it to
totally recreate that index, only one thread is doing this at once.

I was concerned about the safety of using an IndexSearcher to perform
queries on an index that is in the process of being recreated from scratch,
but I guess that if the IndexSearcher takes a snapshot of the index when it
is created (and in my code this creation is synchronized with the write
operations as well so that the threads wait for the write operations to
finish before instantiating an IndexSearcher, and vice versa) this can't be
a problem.

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: 16 February 2005 17:30
To: Lucene Users List
Subject: Re: Concurrent searching  re-indexing


Hi Paul,

If I understand your setup correctly, it looks like you are running
multiple threads that create IndexWriter for the ame directory.  That's
a no no.

This section (first hit) describes all various concurrency issues with
regards to adds, updates, optimization, and searches:
  http://www.lucenebook.com/search?query=concurrent

IndexSearcher (IndexReader, really) does take a snapshot of the index
state when it is opened, so at that time the index segments listed in
segments should be in a complete state.  It also reads index files when
searching, of course.

Otis


--- Paul Mellor [EMAIL PROTECTED] wrote:

 Hi,
 
 I've read from various sources on the Internet that it is perfectly
 safe to
 simultaneously search a Lucene index that is being updated from
 another
 Thread, as long as all write access to the index is synchronized. 
 But does
 this apply only to updating the index (i.e. deleting and adding
 documents),
 or to a complete re-indexing (i.e. create a new IndexWriter with the
 'create' argument true and then re-add all the documents)?
 
 I have a class which encapsulates all access to my index, so that
 writes can
 be synchronized.  This class also exposes a method to obtain an
 IndexSearcher for the index.  I'm running unit tests to test this
 which
 create many threads - each thread does a complete re-indexing and
 then
 obtains an IndexSearcher and does a query.
 
 I'm finding that with sufficiently high numbers of threads, I'm
 getting the
 occasional failure, with the following exception thrown when
 attempting to
 construct a new IndexWriter (during the reindexing) -
 
 java.io.IOException: couldn't delete _a.f1
 at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166)
 at

org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:135)
 at

org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:113)
 at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:151)
 ...
 
 The exception occurs quite infrequently (usually for somewhere
 between 1-5%
 of the Threads).
 
 Does the IndexSearcher take a 'snapshot' of the index at creation? 
 Or does
 it access the filesystem whilst searching?  I am also synchronizing
 creation
 of the IndexSearcher with the write lock, so that the IndexSearcher
 is not
 created whilst the index is being recreated (and vice versa).  But do
 I need
 to ensure that the IndexSearcher cannot search whilst the index is
 being
 recreated as well?
 
 Note that a similar unit test where the threads update the index
 (rather
 than recreate it from scratch) works fine, as expected.
 
 This is running on Windows 2000.
 
 Any help would be much appreciated!
 
 Paul
 
 This e-mail and any files transmitted with it are confidential and
 intended
 solely for the use of the individual or entity to whom they are
 addressed.
 If you are not the intended recipient, you should not copy,
 retransmit or
 use the e-mail and/or files transmitted with it  and should not
 disclose
 their contents. In such a case, please notify
 [EMAIL PROTECTED]
 and delete the message from your own system. Any opinions expressed
 in this
 e-mail and/or files transmitted with it that do not relate to the
 official

RE: Concurrent searching re-indexing

2005-02-17 Thread Morus Walter
Paul Mellor writes:
 
 1. If IndexReader takes a snapshot of the index state when opened and then
 reads the files when searching, what would happen if the files it takes a
 snapshot of are deleted before the search is performed (as would happen with
 a reindexing in the period between opening an IndexSearcher and using it to
 search)?
 
On unix, open files are still there, even if they are deleted (that is,
there is no link (filename) to the file anymore but the file's content
still exists), on windows you cannot delete open files, so Lucene 
AFAIK (I don't use windows) postpones the deletion to a time, when the 
file is closed.
 
 2. Does a similar potential problem exist when optimising an index, if this
 combines all the segments into a single file?
 
AFAIK optimising creates new files.

The only problem that might occur, is opening a reader during index change
but that's handled by a lock.

HTH
Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Concurrent searching re-indexing

2005-02-16 Thread Otis Gospodnetic
Hi Paul,

If I understand your setup correctly, it looks like you are running
multiple threads that create IndexWriter for the ame directory.  That's
a no no.

This section (first hit) describes all various concurrency issues with
regards to adds, updates, optimization, and searches:
  http://www.lucenebook.com/search?query=concurrent

IndexSearcher (IndexReader, really) does take a snapshot of the index
state when it is opened, so at that time the index segments listed in
segments should be in a complete state.  It also reads index files when
searching, of course.

Otis


--- Paul Mellor [EMAIL PROTECTED] wrote:

 Hi,
 
 I've read from various sources on the Internet that it is perfectly
 safe to
 simultaneously search a Lucene index that is being updated from
 another
 Thread, as long as all write access to the index is synchronized. 
 But does
 this apply only to updating the index (i.e. deleting and adding
 documents),
 or to a complete re-indexing (i.e. create a new IndexWriter with the
 'create' argument true and then re-add all the documents)?
 
 I have a class which encapsulates all access to my index, so that
 writes can
 be synchronized.  This class also exposes a method to obtain an
 IndexSearcher for the index.  I'm running unit tests to test this
 which
 create many threads - each thread does a complete re-indexing and
 then
 obtains an IndexSearcher and does a query.
 
 I'm finding that with sufficiently high numbers of threads, I'm
 getting the
 occasional failure, with the following exception thrown when
 attempting to
 construct a new IndexWriter (during the reindexing) -
 
 java.io.IOException: couldn't delete _a.f1
 at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166)
 at

org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:135)
 at

org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:113)
 at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:151)
 ...
 
 The exception occurs quite infrequently (usually for somewhere
 between 1-5%
 of the Threads).
 
 Does the IndexSearcher take a 'snapshot' of the index at creation? 
 Or does
 it access the filesystem whilst searching?  I am also synchronizing
 creation
 of the IndexSearcher with the write lock, so that the IndexSearcher
 is not
 created whilst the index is being recreated (and vice versa).  But do
 I need
 to ensure that the IndexSearcher cannot search whilst the index is
 being
 recreated as well?
 
 Note that a similar unit test where the threads update the index
 (rather
 than recreate it from scratch) works fine, as expected.
 
 This is running on Windows 2000.
 
 Any help would be much appreciated!
 
 Paul
 
 This e-mail and any files transmitted with it are confidential and
 intended
 solely for the use of the individual or entity to whom they are
 addressed.
 If you are not the intended recipient, you should not copy,
 retransmit or
 use the e-mail and/or files transmitted with it  and should not
 disclose
 their contents. In such a case, please notify
 [EMAIL PROTECTED]
 and delete the message from your own system. Any opinions expressed
 in this
 e-mail and/or files transmitted with it that do not relate to the
 official
 business of this company are those solely of the author and should
 not be
 interpreted as being endorsed by this company.
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Concurrent searching re-indexing

2005-02-16 Thread Paul Mellor
But all write access to the index is synchronized, so that although multiple
threads are creating an IndexWriter for the same directory and using it to
totally recreate that index, only one thread is doing this at once.

I was concerned about the safety of using an IndexSearcher to perform
queries on an index that is in the process of being recreated from scratch,
but I guess that if the IndexSearcher takes a snapshot of the index when it
is created (and in my code this creation is synchronized with the write
operations as well so that the threads wait for the write operations to
finish before instantiating an IndexSearcher, and vice versa) this can't be
a problem.

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: 16 February 2005 17:30
To: Lucene Users List
Subject: Re: Concurrent searching  re-indexing


Hi Paul,

If I understand your setup correctly, it looks like you are running
multiple threads that create IndexWriter for the ame directory.  That's
a no no.

This section (first hit) describes all various concurrency issues with
regards to adds, updates, optimization, and searches:
  http://www.lucenebook.com/search?query=concurrent

IndexSearcher (IndexReader, really) does take a snapshot of the index
state when it is opened, so at that time the index segments listed in
segments should be in a complete state.  It also reads index files when
searching, of course.

Otis


--- Paul Mellor [EMAIL PROTECTED] wrote:

 Hi,
 
 I've read from various sources on the Internet that it is perfectly
 safe to
 simultaneously search a Lucene index that is being updated from
 another
 Thread, as long as all write access to the index is synchronized. 
 But does
 this apply only to updating the index (i.e. deleting and adding
 documents),
 or to a complete re-indexing (i.e. create a new IndexWriter with the
 'create' argument true and then re-add all the documents)?
 
 I have a class which encapsulates all access to my index, so that
 writes can
 be synchronized.  This class also exposes a method to obtain an
 IndexSearcher for the index.  I'm running unit tests to test this
 which
 create many threads - each thread does a complete re-indexing and
 then
 obtains an IndexSearcher and does a query.
 
 I'm finding that with sufficiently high numbers of threads, I'm
 getting the
 occasional failure, with the following exception thrown when
 attempting to
 construct a new IndexWriter (during the reindexing) -
 
 java.io.IOException: couldn't delete _a.f1
 at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166)
 at

org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:135)
 at

org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:113)
 at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:151)
 ...
 
 The exception occurs quite infrequently (usually for somewhere
 between 1-5%
 of the Threads).
 
 Does the IndexSearcher take a 'snapshot' of the index at creation? 
 Or does
 it access the filesystem whilst searching?  I am also synchronizing
 creation
 of the IndexSearcher with the write lock, so that the IndexSearcher
 is not
 created whilst the index is being recreated (and vice versa).  But do
 I need
 to ensure that the IndexSearcher cannot search whilst the index is
 being
 recreated as well?
 
 Note that a similar unit test where the threads update the index
 (rather
 than recreate it from scratch) works fine, as expected.
 
 This is running on Windows 2000.
 
 Any help would be much appreciated!
 
 Paul
 
 This e-mail and any files transmitted with it are confidential and
 intended
 solely for the use of the individual or entity to whom they are
 addressed.
 If you are not the intended recipient, you should not copy,
 retransmit or
 use the e-mail and/or files transmitted with it  and should not
 disclose
 their contents. In such a case, please notify
 [EMAIL PROTECTED]
 and delete the message from your own system. Any opinions expressed
 in this
 e-mail and/or files transmitted with it that do not relate to the
 official
 business of this company are those solely of the author and should
 not be
 interpreted as being endorsed by this company.
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


_
This e-mail has been scanned for viruses by MCI's Internet Managed Scanning
Services - powered by MessageLabs. For further information visit
http://www.mci.com

This e-mail and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you are not the intended recipient, you should not copy, retransmit or
use the e-mail and/or files transmitted with it  and should not disclose
their contents. In such a case, please notify [EMAIL PROTECTED]
and delete the message from your own system. Any