I have a fairly straightforward task: I have a collection of N documents and a set of "hot" words. I need to find all occurrences of these words in all the docs.
The original use case was that I would get all the docs at once. In this case, I: 1. Create a single index for all the docs 2. Loop over all hot words. For each word, I find all hits in all the docs 3. I collect and rearrange the hit info to have all hits for each of the indexed doc However, it looks like there might be a different use case: the user might want to add one document at a time to the collection and see the search results immediately. So for this case I am now doing the following: 1. Loop over docs i = 1 : N. For each doc: 1.1 If i == 1 then create index else update index 1.2 Loop over all hot words. For each word, find all hits in all the docs that have been indexed so far, i.e. docs 1 through i 1.3 Collect and rearrange Of course, this is not particularly efficient, especially because I am forced to do a lot or redundant work by searching though docs 1:i instead of just i at each iteration. This is because, if I understand it corrently, I can't specify "search only the part of index that corresponds to doc X". Or can I? Is there any way to make this incremental index/search more efficient? For instance, is it at all possible to restrict where in the index a search for hits is performed? Or any other optimization? Thanks much Ilya Zavorin