Thanks a lot for your response.
My index isn't big-it has only around 100 to 120k documents at any given time.
But it does get updated roughly every 2 hours (new documents are added). Then
every night, the entire index is rebuilt to exclude the deleted documents.
I've tried both the approaches you mentioned but the performance appears rather
slow. Without faceting, I can do a search on this index (including some math
calculations) in around 40 to 80 ms.
when I include faceting for categories that are predefined (3 different fields
with 2 or 3 distinct values), the query time jumps quite a bit to around 200ms.
So my typical query would be- a binary query with at least 4 queries with
faceting over 3 'static' fields (with 2 or 3 distinct values) and 2 'dynamic
fields' (with thousands of distinct values).
When I do faceting with a field that has tens of thousands of distinct values,
the query time jumps drastically to over 1 second.
Here are some snippets of code:
With smaller categories:
SortedList sl = new SortedList();
string indexloc = ConfigurationManager.AppSettings["DocIndexLoc"];
IndexSearcher searcher = new IndexSearcher(indexloc);
foreach (string s in SearchFilters.SourceTypes())
{
TermQuery tq = new TermQuery(new Term("SourceType", s));
Filter f = new QueryFilter(tq);
sl.Add(s,searcher.Search(this.bq ,f).Length());
}
return sl;
With Bigger Categories
IndexReader reader = searcher.GetIndexReader();
QueryFilter baseQueryFilter = new QueryFilter(this.bq );
BitArray baseBitSet = baseQueryFilter.Bits (reader);
if (Cache["cities"] == null)
{
Cache["cities"] = Utilities.TopCities();
}
SortedList sl = new SortedList();
foreach (string s in (ArrayList)Application["cities"])
{
TermQuery tq = new TermQuery(new Term("city", s));
Filter f = new QueryFilter(tq);
BitArray baCity = f.Bits(reader);
baCity.And(baseBitSet);
//do the cardinality function here
}
Am I doing something that is not so efficient? Any suggestions on boosting
performance?
Thanks a lot for your help!
----- Original Message ----
From: Jokin Cuadrado <[EMAIL PROTECTED]>
To: [email protected]
Sent: Tuesday, December 18, 2007 2:44:12 AM
Subject: Re: Faceting in Lucene.Net
could you be more explicit on your needs? How many documents have your
index, how many different categories are and how much is the average
search hit number would be enough to suggest an approach.
In my case i made an custom collector to count the hits on every
category using a fieldcache item to get the item efficiently instead
of call to hit.getDocument. (performance killer).
this is better if your searches return small sets and you have much
categories.
If you have not many terms, and your searches return many results, you
can use queryfilter.bits to get the masks, AND them, and count the
number of set bits on the result. this have the drawback that .net
implementation of Bitarray, don't have an efficient method of counting
the set bits (cardinality on java), but you could get one from the
bitvector class on lucene.net (you must use you own implementation of
bitarray, or use reflection to access the backbone int32 array m_array
and count over him).
here is the function to get the number of ones set in a bitarray:
Private Shared _bitsSetArray256 As Byte() = {0, 1, 1, 2, 1, 2, 2, 3,
1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4,
5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4,
4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3,
4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4,
3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5,
6, 5, 6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3,
3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3,
4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5,
6, 4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6,
6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8}
''' <summary>
''' return the number of bits on bitarray set to one
''' </summary>
''' <remarks></remarks>
Private Function Cardinality(ByVal bits As BitArray) As Int32
Dim arr As UInt32()
arr = bits.GetType().GetField("m_array",
Reflection.BindingFlags.NonPublic Or
Reflection.BindingFlags.Instance).GetValue(bits)
Dim _count As Int32 = 0
For i As Int32 = 0 To arr.Length - 1
_count += _bitsSetArray256(arr(i) And &HFF) + _
_bitsSetArray256((arr(i) >> 8) And &HFF) + _
_bitsSetArray256((arr(i) >> 16) And &HFF) + _
_bitsSetArray256(arr(i) >> 24)
Next i
Return _count
End Function
On Dec 16, 2007 7:33 PM, Soormasher Singh <[EMAIL PROTECTED]> wrote:
> Hello All
>
> I'm trying to use Lucene.Net for faceting (Category counting and
search refinement). I've not been able to find any examples of this using
Lucene.Net. I've tried to use the approach used in Solr, but the
performance hasn't been the greatest.
> Can anyone please help me with this? Any code/examples of anyone
using Lucene.Net for category counting/faceting?
>
> Thanks a bunch!
>
>
>
>
>
>
____________________________________________________________________________________
> Be a better friend, newshound, and
> know-it-all with Yahoo! Mobile. Try it now.
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>
____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now.
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ