Eran, Make no mistake, the poor performance you are experiencing is due to calling commit on every document addition and not due to internal 'coding by exception'. There are transactional capabilities of Lucene that will ensure that your documents are added and persisted to disk. Check out the IndexWriter documentation for more information.
The only 'connection' between the reader and the writer are the files on disk. The writer writes them once, they are not updated, and the reader holds a reference to the file to ensure it is not deleted out from underneath it as it still needs to read from it to perform searches. During a commit, all of your changes are written to disk and any necessary segment merges take place, which leaves the older segments that were merged together as 'orphans' that are no longer referenced by the segments file and are cleaned up during the final stage of the commit process after all of the new segments have been written. An attempt is made to then clean up the older segments that are no longer necessary, which will fail as your reader still has them open. It fails gracefully in that the file names are persisted internally to attempt to delete again later, hopefully after the reader has been reopened and a reference to the orphaned files is no longer being held. I suggest you step through the commit process in a debugger or use a profiler to demonstrate this issue. Michael -----Original Message----- From: Eran Sevi [mailto:erans...@gmail.com] Sent: Tue 11/17/2009 4:55 AM To: lucene-net-user@incubator.apache.org Subject: Re: IndexWriter is slow when reader is open Michael, Thanks for the answer. I thought the reader was less connected to the writer. Basically what your saying is that as long as at least one reader is open, exceptions are thrown when trying to commit changes (or more accurately, when trying to merge segments) ? Can you point me to the place in the source code where that happens? What happens to the new documents that were added? are they still saved in another segments? It's very important to us to make sure every document is persistent in the index so working in batches could be a problem. But if there's a way to save each added document to disk without merging the segment with older segments, this can solve our problem. And since the reader can't see the new segments anyway until it's reopened, I don't see a problem continuing writing documents to new segments without performing a merge. I'll try to change the merge policy/scheduler and see what happens. Anyway, coding by exception is quite bad practice. Since we're following the java versions I guess it'll take time to be able to change that. Eran. On Mon, Nov 16, 2009 at 8:56 PM, Michael Garski <mgar...@myspace-inc.com>wrote: > Eran, > > The root cause of the issue is due to calling commit after every document > addition while having a reader open. Calls to commit should be batched up - > we frequently use batches of 100 or 1000 between commits. > > This is by design within Lucene. Adding documents will cause segments to > merge and the writer will then delete the older segments that have been > merged together to create a new one, however with an open reader the writer > will not be able to delete the older segment due to a file lock held by the > reader. On the call to delete the file an exception is thrown and swallowed > internally and the name of the file that the delete was attempted upon is > added to a list of files that can be deleted on another call. > > I suggest you refrain from calling commit so often, as that is why you are > experiencing performance issues. > > Michael > > > -----Original Message----- > From: Eran Sevi [mailto:erans...@gmail.com] > Sent: Mon 11/16/2009 5:07 AM > To: lucene-net-user@incubator.apache.org > Subject: Re: IndexWriter is slow when reader is open > > I've tried to use it with read-only mode and it looks like it's even worse > right now. > > I must admit that we're abusing the indexing a bit by commiting after each > document addition, but still when there's no reader open, each document is > indexed in about 30-50ms and when there's a read-only reader open then each > document is indexed in about 150-500ms. > Why should an open reader affect the commit process so deeply? > > I wonder if no one encountered this phenomena before. > > > On Sat, Nov 14, 2009 at 8:27 PM, Matt Honeycutt <mbhoneyc...@gmail.com > >wrote: > > > 2.4 does indeed support read-only mode. I don't know how much it will > > help, but I would definitely try it. > > > > On 11/14/09, Eran Sevi <erans...@gmail.com> wrote: > > > I'm still using version 2.4 so I think there's still no read only mode. > > > Is there no other way to prevent this slow down in previous versions? > > > > > > Eran. > > > > > > On Thu, Nov 12, 2009 at 8:16 PM, Michael Garski > > > <mgar...@myspace-inc.com>wrote: > > > > > >> Eran, > > >> > > >> What version of Lucene are you using? Are you opening the IndexReader > > >> in read-only mode? > > >> > > >> Michael > > >> > > >> -----Original Message----- > > >> From: Eran Sevi [mailto:erans...@gmail.com] > > >> Sent: Thursday, November 12, 2009 9:06 AM > > >> To: lucene-net-user@incubator.apache.org > > >> Subject: IndexWriter is slow when reader is open > > >> > > >> Hi, > > >> I'm using Lucene.Net 2.4 and I just noticed that when I index > documents > > >> while there's at least one IndexReader open on that index (even > without > > >> doing anything), the indexing speed is slower by a factor of 3 to 5. > > >> When > > >> closing the reader, the indexing speed goes back to normal. > > >> I'm not doing any deletes, only adds. > > >> > > >> My index is going to be updated regularly and there's going to be a > > >> reader/searcher in use almost all the time so this might be a big > > >> problem > > >> for me. > > >> > > >> Does anyone have a clue if this is normal behavior? why does it happen > > >> and > > >> how can I avoid such a big loss in performance? > > >> > > >> > > >> Thanks, > > >> Eran. > > >> > > >> > > > > > > > >