Hi Again,
I have a situation where I want to facet over a MultiSearcher. I ran into a
great post
(http://mail-archives.apache.org/mod_mbox/incubator-lucene-net-user/200712.mbox/%[email protected]%3e)
which lead me to learn al ot more about the inner working of Lucene, but
admittedly not enough. So here is my test setup:
Test Machine :
2XDual Core - 3.4Ghz
3Gig mem
Index:
12 Indexes
11Gigs combined (please don't ask me why)
~3 million docs
for argument sake, half of these indexes are optimized
my facet is constrained to 90K hits, meaning, i run a search for some keyword
(resulting in 90K of 3mil docs) and am looking for facet counts on 9 fields
against those results.
Some performance numbers...
1st search (or warming) = ~1.7 seconds
Mem consumption ~100Megs
The problem I am facing is A.) warming w/facet performance and B.) memory
consumption
Post warming search with 9 facet fields:
Mem consumption = ~1.3G
~48 seconds (~90K hits)
Second search with Facets is like 15ms for the search and 200ms for the facet -
well within everything I ever wanted out of Lucene (and then some, even though
mem is stuck at ~1.3G)
In a production scenario, however, even if I could get away with the warming
times and mem consumption - this scenario is what we call a "group" and we have
hundreds of them (even if only a fraction are running in parallel). So I have
to get these initial numbers way down, but I am not sure how to do it....
In a very crude first-pass attempt to morph Jokin's awesome contribution - In
it's more simpler form, I came up with meh:
public static IEnumerable<KeyValuePair<string,int>> Facet(Query query,
MultiSearcher s, string Field, int max)
{
Dictionary<string,int> result = new Dictionary<string,int>();
for (int q = 0; q < s.GetSearchables().Length; q++)
{
//TODO: don't assume an IndexSearcher is the basis for searchables
StringIndex stringIndex =
FieldCache_Fields.DEFAULT.GetStringIndex(((IndexSearcher)s.GetSearchables()[q]).Reader,
Field);
int[] c = new int[stringIndex.lookup.Length];
FacetCollector results = new FacetCollector(c, stringIndex);
((IndexSearcher)s.GetSearchables()[q]).Search(query, results);
....
....
}
return result;
}
... refers to basically merging results together and getting top hits, but it
doesn't effect the numbers given above
Can anyone shed some light on methods to do faceting across multiple indexes? I
knew when writing this code this afternoon I was going to need to && some bit
sets, but also still just getting familiar with the inner-workings of Lucene,
and if anyone could point me in the right direction I would be grateful
!
Thanks!
Grahem