Hi, I am trying to implement a kind of faceted search using Lucene 2.4.0.
I have a list of configuration rules that tell me how to generate this facets and the corresponding queries (that can range from simple term queries to complex boolean queries). When my application starts, it creates the whole set of facets objects and initializes them. For each facet: - I create the query according to the configured rule; - I ask the reader for the bitset corresponding to that query and I store it in the Facet object; - I get the cardinality of the bitset and I save it in the Facet object as its "initial count". When the user does a search I have to update the "counts" associated to each Facet: - I get the bitset corresponding to the "query + filter" generated by the user search; - I get the cardinality of the ("search bitset" AND "facet bitset") and I save it as the updated count. In my first solution, I used only "OpenBitSetDISI" objects, both for Facet bitset and for search bitset. So I could use "intersectionCount" method to get updated counts after user search. This works very well and it is very fast, but when the number of documents in the index and the number of facets grows it is too memory consuming. So I tried a different solution: when I create facet bitsets I use the same rule applied in ChainedFilter/BooleanFilter to decide if I have to store an OpenBitSet or a SortedVIntList. When I have to calculate updated counts: - if the facet has an OpenBitSet, I use the "intersectionCount" method directly; - if the facet has a SortedVIntList, I first create a new OpenBitSetDISI using the SortedVIntList.iterator and then I use the "intersectionCount" method. In this way, I use a smaller amount of memory at initialization time, but for each user search I create a large number of objects (that I suddenly throw away) and this affects application performance because it wastes a lot of time doing GC. So my question is: is there a better way to accomplish this task? I think, it would be fine if I could calculate "intersectionCount" directly on SortedVIntList objects, but I have not found nothing like that in Lucene 2.4 JavaDoc. Am I missing something? As a reference, now my index contains more than 500.000 documents and I have to create/manage up to 50.000 facets. Using "second solution", at initialization time my facets structure requires more or less 120MB (and this is good enough), while updating counts it uses even 2GB of memory (and this is very bad). Thanks in advance, Raf