Thanks for the advice Paul, I thought about doing two passes.. Delete all and then insert all, but the problem with that approach is if my program fails somewhere in between start and end.. I may end up with many deleted records and none changed. The same could happen with a batch build. How are you handling that possible scenario?
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, November 11, 2005 11:22 AM To: java-user@lucene.apache.org Subject: Re: Insert new records into index Hello, You really do need to batch up your deletes and inserts otherwise it will take a long time. If you can, do all your deletes and then all of your inserts. I have gone to the trouble of queueing index operations and when a new operation comes along I reorder the job queue to ensure deletes and indexing jobs are grouped together. If your system doesn't allow you to batch together deletes and writes then something I have found useful is to split the index into two. I have an "old" index and a "new" index. I add documents to the "new" index and then periodically merge into the "old" and clear out the "new". To delete I have to delete from both indexes as I don't know where my documents are. This means that you only ever have to open the "old" index with an IndexReader (except when merging of course). The "new" index can be opened with a writer for writes and swapped to a reader for reads. By keeping this index small speeds up the constant opening and closing. Searching is straightforward using the MultiSearcher. I don't know anything about the lock problem - it's not something I've ever seen (I'm using 1.4.3). Regards Paul I. "Aigner, Thomas" <[EMAIL PROTECTED] t.com> To <java-user@lucene.apache.org> 11/11/2005 14:55 cc Subject Please respond to Insert new records into index [EMAIL PROTECTED] apache.org Howdy all, I am having a problem with inserting/updating records into my index. I have approximately 1.5M records in the index taking about 2.5G space when optimized. If I want to update 1000 records, I delete the old item and insert the new one. This is taking a LONG time to accomplish. I believe this is taking time due to the fact that I have to close the writer to delete from the reader, then open the writer to insert the new record. I have to do this 1 time for each item that needs to be inserted. I tried to not optimize the index, thinking that opening the index/closing it was taking the big time, but the time seems to be the same when I have many files ( I had to uncrease the ulimit by quite a few to avoid the too many files error as well). Snippets of code.. To delete a record, I am: //Close the index writer.close(); //Instantiate the reader object for deletion IndexReader reader = IndexReader.open(dir); reader.delete(new Term("simm",simm)); reader.close(); //Get directory again and Create a new writer to open for insert Then insert the record.. and move to the next record I can't keep the writer open to delete an item cause I get this error: Lock obtain timed out: Lock@ Anyone have any ideas on how to speed this process up? --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]