Think about IndexModifier to change your index, although the documentation does state that it's better to batch your deletions together and batch your additions together if possible.
100Mb is not, in my experience, a very big index, so I really don't anticipate many problems. Do note that you can't modify a document in place. That is, to update a document, you need to delete it from the index and re-add it. I believe, if you just write some code to take some timings, your questions will be answered to your satisfaction. I also believe that if you open your index, change it, and close it for each document, your performance will look much worse (on a per-document basis) than if you open your index, do a bunch of modifications, and *then* close it. Also note that you MUST close and re-open your IndexSearcher for any changes in your index to be visible. A Searcher will only show the documents in the index when it was opened (don't worry, it works just fine as you modify your index, it just doesn't show those modifications). However, you want to keep your single instance of a searcher open as long as possible since opening a Searcher is a very expensive operation. This really leaves you with the question "How quickly to you need to see the results of your index modifications?" If the answer is that an hour's delay is good enough, you can just close/re-open your searcher every hour (perhaps just after doing your updates). Finally, it would be helpful if we knew how frequently is "frequently" when you talk about updates. Are you talking hundreds of documents a minute? tens? a document an hour? Again, I think you'll have a better idea of whether updating is acceptable if you try an experiment and just take some timings. I've noticed that when creating an index, the optimize step is a long one..... Hope this helps Erick P.S. Your English is waaaaaaay better than my ... well... any other language <G>..... On 10/3/06, Enrique Lamas <[EMAIL PROTECTED]> wrote:
Hi, I'm working with a 100Mb length index. By application requirements, the information indexed is frecuently updated, with plenty of modifications, deletions and additions. I think Lucene is a very powerful searching tool once the index is already created, but I'm not sure if update index frecuently is also efficient. Even when index is quite long. Previosly, I regenerated all index from zero once at the day, but now I need changes to be noticed earlier. ¿How would you implement this? Sorry for my bad english. Thanks