Re: MemoryIndex

Wolfgang Hoschek Tue, 02 May 2006 10:34:11 -0700

MemoryIndex was designed to maximize performance for a specific usecase: pure in-memory datastructure, at most one document perMemoryIndex instance, any number of fields, high frequency reads,high frequency index writes, no thread-safety required, optionalsupport for storing offsets.

I briefly considered extending it to the multi-document case, buteventually refrained from doing so, because I didn't really need suchfunctionality myself (no itch). Here are some issues to consider whenattempting such an extension:


- The internal datastructure would probably look quite different

- Datastructure/algorithmic trade-offs regarding time vs space, readvs. write frequency, common vs. less common use cases

- Hence, it may well turn out that there's not much to reuse.

- A priori, it isn't clear whether a new solution would besignificantly faster than normal RAMDirectory usage. Thus...

- Need benchmark suite to evaluate the chosen trade-offs.

- Need tests to ensure correctness (in practise, meaning, it behavesjust like the existing alternative).

I'd say it's a non-trival untertaking. For example, right now, Idon't have time for such an effort. That doesn't mean it's impossibleor shouldn't be done, of course. If someone would like to run with itthat would be great, but in light of the above issues, I'd suggestdoing it in a new class (say MultiMemoryIndex or similar).

I believe Mark has dome some initial work in that direction, based onan independent (and different) implementation strategy.


Wolfgang.

On May 2, 2006, at 12:25 AM, Robert Engels wrote:

Along the lines of Lucene-550, what about having a MemoryIndex thatacceptsmultiple documents, then wrote the index once at the end in theLucene file
format (so it could be merged) during close.
When adding documents using an IndexWriter, a new segment iscreated foreach document, and then the segments are periodically merged inmemory,
and/or with disk segments. It seems that when constructing an Index or
updating a "lot" of documents in an existing index, the write,read, mergecycle is inefficient, and if the documents/field information weremaintained
in order (TreeMaps) greater efficiency would be realized.

With a memory index, the memory needed during update will increase
dramatically, but this could still be bounded, and a "disk based"index
segment written when too many documents are in the memory index (max
buffered documents).
Does this "sound" like an improvement? Has anyone else triedsomething like
this?



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: MemoryIndex

Reply via email to