Re: Realtime Search

J. Delgado Fri, 26 Dec 2008 10:55:49 -0800

One thing that I forgot to mention is that in our implementation the
real-time indexing took place with many "folder-based" listeners writing  to
many  tiny in-memory indexes partitioned by "sub-sources" with fewer
long-term and archive indexes per box. Overall distributed search across
various lucene-based search services was done using a federator component,
very much like shard based searches is done today (I believe).


-- Joaquin.
l


On Fri, Dec 26, 2008 at 10:48 AM, J. Delgado <[email protected]>wrote:

> The addition of docs into tiny segments using the current data structures
> seems the right way to go. Sometime back one of my engineers implemented
> pseudo real-time using MultiSearcher by having an in-memory (RAM based)
> "short-term" index that auto-merged into a disk-based "long term" index that
> eventually get merged into "archive" indexes. Index optimization would take
> place during these merges. The search we required was very time-sensitive
> (searching last-minute breaking news wires). The advantage of having an
> archive index is that very old documents in our applications were not
> usually searched on unless archives were explicitely selected.
>
> -- Joaquin
>
>
> On Fri, Dec 26, 2008 at 10:20 AM, Doug Cutting <[email protected]> wrote:
>
>> Michael McCandless wrote:
>>
>>> So then I think we should start with approach #2 (build real-time on
>>> top of the Lucene core) and iterate from there.  Newly added docs go
>>> into a tiny segments, which IndexReader.reopen pulls in.  Replaced or
>>> deleted docs record the delete against the right SegmentReader (and
>>> LUCENE-1314 lets reopen carry those pending deletes forward, in RAM).
>>>
>>> I would take the simple approach first: use ordinary SegmentReader on
>>> a RAMDirectory for the tiny segments.  If that proves too slow, swap
>>> in Memory/InstantiatedIndex for the tiny segments.  If that proves too
>>> slow, build a reader impl that reads from DocumentsWriter RAM buffer.
>>>
>>
>> +1 This sounds like a good approach to me.  I don't see any fundamental
>> reasons why we need different representations, and fewer implementations of
>> IndexWriter and IndexReader is generally better, unless they get way too
>> hairy.  Mostly it seems that real-time can be done with our existing toolbox
>> of datastructures, but with some slightly different control structures.
>>  Once we have the control structure in place then we should look at
>> optimizing data structures as needed.
>>
>> Doug
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

Re: Realtime Search

Reply via email to