On Dec 24, 2008, at 12:23 PM, Jason Rutherglen wrote:

> Also, what are the requirements? Must a document be visible to search within 10ms of being added?

0-5ms. Otherwise it's not realtime, it's batch indexing. The realtime system can support small batches by encoding them into RAMDirectories if they are of sufficient size.

> Or must it be visible to search from the time that the call to add it returns?

Most people probably expect the update latency offered by SQL databases.

This is the problem spot. In an SQL database, when an update/add occurs, the same connection/transaction will see the changes when requested IMMEDIATELY - there is 0 latency.

In order to do this you MUST have the concept of transactions and/or connections.

OR you must make it so that every update/add is immediately available - this is probably simpler.

You just need to always search the ram and the disk index. The deletions must be mapped to the disk index, and the "latest" version of the document must be obtained from the ram index (if it is there).

You just need to merge the ram and disk in the background... and continually create new/merged ram disks.

The memory requirements are going to go up, but you can always add a "block" so that if the background merger gets too far behind, the system blocks any current requests (to avoid the system running out of memory).



> As a baseline, how fast is it to simply use RAMDirectory?

It depends on how fast searches over the realtime index need to be. The detriment to speed occurs with having many small segments that are continuously decoded (terms, postings, etc). The advantage of MemoryIndex and InstantiatedIndex is an actual increase in search speed compared with RAMDirectory (see the Performance Notes at http://hudson.zones.apache.org/hudson/job/ Lucene-trunk/javadoc//org/apache/lucene/index/memory/ MemoryIndex.html and )and no need to continuously decode segments that are short lived.

Anecdotal tests indicated the merging overhead of using RAMDirectory as compared with MI or II is significant enough to make it only useful for doing batches in the 1000s which does not seem to be what people expect from realtime search.

On Wed, Dec 24, 2008 at 9:53 AM, Doug Cutting <cutt...@apache.org> wrote:
Jason Rutherglen wrote:
2) Implement realtime search by incrementally creating and merging readers in memory. The system would use MemoryIndex or InstantiatedIndex to quickly (more quickly than RAMDirectory) create indexes from added documents.

As a baseline, how fast is it to simply use RAMDirectory? If one, e.g., flushes changes every 10ms or so, and has a background thread that uses IndexReader.reopen() to keep a fresh version for reading?

Also, what are the requirements? Must a document be visible to search within 10ms of being added? Or must it be visible to search from the time that the call to add it returns? In the latter case one might still use an approach like the above. Writing a small new segment to a RAMDirectory and then, with no merging, calling IndexReader.reopen(), should be quite fast. All merging could be done in the background, as should post-merge reopens() that involve large segments.

In short, I wonder if new reader and writer implementations are in fact required or whether, perhaps with a few optimizations, the existing implementations might meet this need.

Doug

---------------------------------------------------------------------

To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Reply via email to