> Also, what are the requirements? Must a document be visible to search within 10ms of being added?
0-5ms. Otherwise it's not realtime, it's batch indexing. The realtime system can support small batches by encoding them into RAMDirectories if they are of sufficient size. > Or must it be visible to search from the time that the call to add it returns? Most people probably expect the update latency offered by SQL databases. > As a baseline, how fast is it to simply use RAMDirectory? It depends on how fast searches over the realtime index need to be. The detriment to speed occurs with having many small segments that are continuously decoded (terms, postings, etc). The advantage of MemoryIndex and InstantiatedIndex is an actual increase in search speed compared with RAMDirectory (see the Performance Notes at http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/index/memory/MemoryIndex.htmland )and no need to continuously decode segments that are short lived. Anecdotal tests indicated the merging overhead of using RAMDirectory as compared with MI or II is significant enough to make it only useful for doing batches in the 1000s which does not seem to be what people expect from realtime search. On Wed, Dec 24, 2008 at 9:53 AM, Doug Cutting <cutt...@apache.org> wrote: > Jason Rutherglen wrote: > >> 2) Implement realtime search by incrementally creating and merging readers >> in memory. The system would use MemoryIndex or InstantiatedIndex to quickly >> (more quickly than RAMDirectory) create indexes from added documents. >> > > As a baseline, how fast is it to simply use RAMDirectory? If one, e.g., > flushes changes every 10ms or so, and has a background thread that uses > IndexReader.reopen() to keep a fresh version for reading? > > Also, what are the requirements? Must a document be visible to search > within 10ms of being added? Or must it be visible to search from the time > that the call to add it returns? In the latter case one might still use an > approach like the above. Writing a small new segment to a RAMDirectory and > then, with no merging, calling IndexReader.reopen(), should be quite fast. > All merging could be done in the background, as should post-merge reopens() > that involve large segments. > > In short, I wonder if new reader and writer implementations are in fact > required or whether, perhaps with a few optimizations, the existing > implementations might meet this need. > > Doug > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >