I think that would work. But I'm not 100% sure of what you are trying to
achieve.
Just a notice:
Sorting on results has poor performance, if you have a large index, we ran into
severe performance problems with just a coupe of million articles which lead us
to modify the ranking instead.
Code for merge without optimize:
public virtual void AddIndexesWithoutMerge(Directory dir)
{
lock (this)
{
int start = segmentInfos.Count;
SegmentInfos sis = new SegmentInfos(); // read
infos from dir
sis.Read(dir);
for (int j = 0; j < sis.Count; j++)
{
segmentInfos.Add(sis.Info(j)); // add
each info
}
// merge newly added segments in log(n) passes
while (segmentInfos.Count > start + mergeFactor)
{
for (int base_Renamed = start + 1;
base_Renamed < segmentInfos.Count; base_Renamed++)
{
int end =
System.Math.Min(segmentInfos.Count, base_Renamed + mergeFactor);
if (end - base_Renamed > 1)
MergeSegments(base_Renamed, end);
}
}
MaybeMergeSegments();
}
}
(in indexwriter class (C# code as you notice, will probably look about the same
in java))
There is a AddIndexes method in the original implementation of indexwriter,
however that would cause a merge of files on disc my version causes the
directory to be writed to a new file (so if you put in a RamDirectory
containing 10.000 docs you will get a new file with 10.000 docs on disk, which
later will be merged by lucene when mergefactor is triggered on FS indexwriter).
/
Regards
Marcus
-----Ursprungligt meddelande-----
Från: Tobias Lohr [mailto:[EMAIL PROTECTED]
Skickat: den 17 januari 2008 15:15
Till: [email protected]
Ämne: Re: SV: SV: Integrating dynamic data into Lucene search/ranking
Thanks for your hint. If its possible I would take a look into the code, but
the approach is interesting.
What would you say to this approach I developed in my mind:
- Having an additional quite smaller index, were only the dynamic data resides
and is incorporated every N seconds with incremental index updates.
- Documents of the additional index have the same semantical "id" field, to
model a relation between them.
- A search is actually based on the index containing the searchable content,
but the sorting/ranking is done using a SortComparatorSource, which "extracts"
the information and calculates the score for the documents of the content
index.
What do you say?
-------- Original-Nachricht --------
> Datum: Thu, 17 Jan 2008 14:26:53 +0100
> Von: "Marcus Falk" <[EMAIL PROTECTED]>
> An: [email protected]
> Betreff: SV: SV: Integrating dynamic data into Lucene search/ranking
> In our solution we used a RAMDir for the newest incoming articles and a
> FSDir for older ones. Then we had a limit for the ramdir like 10.000
> documents when that limit were hit we used mergesegments to move the content
> from
> ramdir -> fsdir, actually we had to do some modification in the
> mergesegment method since it always seemed to do an optimize on the index
> after the
> merge, I have the code if u want it.
>
> If you use RAMDir + FSDir you can use 2 indexserchers and one
> multisearcher on top. The indexsearcher that uses the small RAMDir can be
> rebinded
> quite often.
>
> /
> Regards
> M
>
>
> -----Ursprungligt meddelande-----
> Från: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
> Skickat: den 17 januari 2008 10:55
> Till: [email protected]
> Ämne: Re: SV: Integrating dynamic data into Lucene search/ranking
>
> Tobias Lohr wrote:
> > I'm not really sure, if this approach is possible for working in changes
> every - let's say - 30 seconds!?
>
> The conventional wisdom is to use RAMDirectory in such scenarios. I.e.
> you commit frequent updates to a RAMDirectory and frequently reopen its
> Searcher (which should be fast). Periodically, merge the RAMDirectory
> index with your on-disk index - you need to open a new IndexSearcher in
> the background, warm it up with the latest N queries, and when it's
> ready you swap searchers, i.e. you close the old one, purge the
> RAMDirectory (since it was synced to the on-disk index), and start using
> the new IndexSearcher.
>
> And again, start accumulating new docs in the RAMDirectory, etc, etc ...
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> > Datum: Thu, 17 Jan 2008 05:35:13 +0100
> > Von: "Marcus Falk" <[EMAIL PROTECTED]>
> > An: [email protected], [email protected]
> > Betreff: SV: Integrating dynamic data into Lucene search/ranking
>
> > We did this in our system, indexing a constant flow of news articles,
> > by doing as Otis described (reopened the indexsearcher)..
> >
> > Every 3:d minute we are creating a new indexsearcher in the background
> > after this searcher has been created we are fireing some warm up
> > queries against it and after that we change the old searcher to point to
> the new one.
> > Works fine for us and we got large indexes (several millions of
> articles)...
> >
> > /Regards
> > Marcus
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]