Hi Again,

 I have a situation where I want to facet over a MultiSearcher. I ran into a 
great post 
(http://mail-archives.apache.org/mod_mbox/incubator-lucene-net-user/200712.mbox/%[email protected]%3e)
 which lead me to learn al ot more about the inner working of Lucene, but 
admittedly not enough. So here is my test setup:

 

Test Machine :
2XDual Core - 3.4Ghz
3Gig mem

 

Index:

12 Indexes

11Gigs combined (please don't ask me why)

~3 million docs

for argument sake, half of these indexes are optimized

 

my facet is constrained to 90K hits, meaning, i run a search for some keyword 
(resulting in 90K of 3mil docs) and am looking for facet counts on 9 fields 
against those results.

 

Some performance numbers...

1st search (or warming) = ~1.7 seconds

Mem consumption ~100Megs

 

The problem I am facing is A.) warming w/facet performance and B.) memory 
consumption

 

Post warming search with 9 facet fields:

Mem consumption = ~1.3G
~48 seconds (~90K hits)

 

Second search with Facets is like 15ms for the search and 200ms for the facet - 
well within everything I ever wanted out of Lucene (and then some, even though 
mem is stuck at ~1.3G)

 

In a production scenario, however, even if I could get away with the warming 
times and mem consumption - this scenario is what we call a "group" and we have 
hundreds of them (even if only a fraction are running in parallel). So I have 
to get these initial numbers way down, but I am not sure how to do it....

 

In a very crude first-pass attempt to morph Jokin's awesome contribution - In 
it's more simpler form, I came up with meh:

 

public static IEnumerable<KeyValuePair<string,int>> Facet(Query query, 
MultiSearcher s, string Field, int max)

{

       Dictionary<string,int> result = new Dictionary<string,int>();

       for (int q = 0; q < s.GetSearchables().Length; q++)

       { 

           //TODO: don't assume an IndexSearcher is the basis for searchables

           StringIndex stringIndex = 
FieldCache_Fields.DEFAULT.GetStringIndex(((IndexSearcher)s.GetSearchables()[q]).Reader,
 Field);

           int[] c = new int[stringIndex.lookup.Length];

           FacetCollector results = new FacetCollector(c, stringIndex);

          ((IndexSearcher)s.GetSearchables()[q]).Search(query, results);

          ....

          ....

       }

       return result;

}

 

... refers to basically merging results together and getting top hits, but it 
doesn't effect the numbers given above

 

Can anyone shed some light on methods to do faceting across multiple indexes? I 
knew when writing this code this afternoon I was going to need to && some bit 
sets, but also still just getting familiar with the inner-workings of Lucene, 
and if anyone could point me in the right direction I would be grateful

!

Thanks!

Grahem

 

 

 

Reply via email to