Jokin

Can't thank you enough....I implemented the changes you suggested, along with 
the Solr style faceting using your class! 
My initial tests show an order of magnitude improvement in performance. I'll 
have the entire bunch of changes implemented and report on the query timings 
etc. 

thanks again! 



----- Original Message ----
From: Jokin Cuadrado <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wednesday, December 19, 2007 10:04:57 AM
Subject: Re: Faceting in Lucene.Net


first of all, take a look at
http://wiki.apache.org/lucene-java/BasicsOfPerformance

some things that i have noted:
-  you are opening an indexreader for every request, you should have a
shared indexreader, term vectors and queryfilter for example are
cached in it, so if you reopen the indexreader for every request you
have to go to index for rebuild it.

 -  for categories whith much different values or searchs with small
resultsets, you should use the collector approach.
I attached a file, it's a custom translation of the ideas behind the
faceting search in solr to c#, the usage is simple, once you have
build the query, call to: (category is the field to facetize)
SimpleFacets.facet(query, lucene_searcher, "category", MaxResults)

and will return a collection of value - count entries. If you set
maxresults it will be limited to that, if not it will return a
collection with the size of the categories. it may have conditions
that are useful for our index, so you might have to tweak a bit. As
you can see in the code, i left the class in the namespace
Lucene.Net.Util so you have to reference or import it.


On Dec 19, 2007 5:09 PM, Soormasher Singh <[EMAIL PROTECTED]> wrote:
> Thanks a lot for your response.
> My index isn't big-it has only around 100 to 120k documents at any
 given time. But it does get updated roughly every 2 hours (new documents
 are added). Then every night, the entire index is rebuilt to exclude
 the deleted documents.
> I've tried both the approaches you mentioned but the performance
 appears rather slow. Without faceting, I can do a search on this index
 (including some math calculations) in around 40 to 80 ms.
> when I include faceting for categories that are predefined (3
 different fields with 2 or 3 distinct values), the query time jumps quite a
 bit to around 200ms.
> So my typical query would be- a binary query with at least 4 queries
 with faceting over 3 'static' fields (with 2 or 3 distinct values) and
 2 'dynamic fields' (with thousands of distinct values).
> When I do faceting with a field that has tens of thousands of
 distinct values, the query time jumps drastically to over 1 second.
> Here are some snippets of code:
>
> With smaller categories:
>
>   SortedList sl = new SortedList();
>         string indexloc =
 ConfigurationManager.AppSettings["DocIndexLoc"];
>         IndexSearcher searcher = new IndexSearcher(indexloc);
>
>
>         foreach (string s in SearchFilters.SourceTypes())
>         {
>             TermQuery tq = new TermQuery(new Term("SourceType", s));
>             Filter f = new QueryFilter(tq);
>
>             sl.Add(s,searcher.Search(this.bq ,f).Length());
>
>         }
>
> return sl;
>
> With Bigger Categories
>
>      IndexReader reader = searcher.GetIndexReader();
>         QueryFilter baseQueryFilter = new QueryFilter(this.bq );
>         BitArray  baseBitSet = baseQueryFilter.Bits (reader);
>
>         if (Cache["cities"] == null)
>         {
>             Cache["cities"] = Utilities.TopCities();
>         }
>
>         SortedList sl = new SortedList();
>
>         foreach (string s in (ArrayList)Application["cities"])
>         {
>             TermQuery tq = new TermQuery(new Term("city", s));
>             Filter f = new QueryFilter(tq);
>             BitArray baCity = f.Bits(reader);
>
>             baCity.And(baseBitSet);
>          //do the cardinality function here
>
>         }
>
> Am I doing something that is not so efficient? Any suggestions on
 boosting performance?
>
> Thanks a lot for your help!
>
>
>
>
>
>
>
>
>
>
> ----- Original Message ----
> From: Jokin Cuadrado <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Tuesday, December 18, 2007 2:44:12 AM
> Subject: Re: Faceting in Lucene.Net
>
>
> could you be more explicit on your needs? How many documents have
 your
> index, how many different categories are and how much is the average
> search hit number would be enough to suggest an approach.
>
> In my case i made an custom collector to count the hits on every
> category using a fieldcache item to get the item efficiently instead
> of call to hit.getDocument. (performance killer).
>
> this is better if your searches return small sets and you have much
> categories.
>
> If you have not many terms, and your searches return many results,
 you
> can use queryfilter.bits to get the masks, AND them, and count the
> number of set bits on the result.  this have the drawback that .net
> implementation of Bitarray, don't have an efficient method of
 counting
> the set bits (cardinality on java), but you could get one from the
> bitvector class on lucene.net (you must use you own implementation of
> bitarray, or use reflection to access the backbone int32 array
 m_array
> and count over him).
>
> here is the function to get the number of ones set in a bitarray:
>
>  Private Shared _bitsSetArray256 As Byte() = {0, 1, 1, 2, 1, 2, 2, 3,
> 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4,
> 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4,
> 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3,
> 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4,
> 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5,
> 6, 5, 6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3,
> 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3,
> 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
> 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5,
> 6, 4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6,
> 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8}
>
>     ''' <summary>
>     ''' return the number of bits on bitarray set to one
>     ''' </summary>
>     ''' <remarks></remarks>
>     Private Function Cardinality(ByVal bits As BitArray) As Int32
>         Dim arr As UInt32()
>         arr = bits.GetType().GetField("m_array",
> Reflection.BindingFlags.NonPublic Or
> Reflection.BindingFlags.Instance).GetValue(bits)
>         Dim _count As Int32 = 0
>         For i As Int32 = 0 To arr.Length - 1
>             _count += _bitsSetArray256(arr(i) And &HFF) + _
>                 _bitsSetArray256((arr(i) >> 8) And &HFF) + _
>                 _bitsSetArray256((arr(i) >> 16) And &HFF) + _
>                 _bitsSetArray256(arr(i) >> 24)
>         Next i
>         Return _count
>     End Function
>
>
>
>
> On Dec 16, 2007 7:33 PM, Soormasher Singh <[EMAIL PROTECTED]>
 wrote:
> > Hello All
> >
> > I'm trying to use Lucene.Net for faceting (Category counting and
>  search refinement). I've not been able to find any examples of this
 using
>  Lucene.Net. I've tried to use the approach used in Solr, but the
>  performance hasn't been the greatest.
> > Can anyone please help me with this? Any code/examples of anyone
>  using Lucene.Net for category counting/faceting?
> >
> > Thanks a bunch!
> >
> >
> >
> >
> >
> >
>
  
____________________________________________________________________________________
> > Be a better friend, newshound, and
> > know-it-all with Yahoo! Mobile.  Try it now.
>   http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> >
>
>
>
>
>
>
>      
 
____________________________________________________________________________________
> Be a better friend, newshound, and
> know-it-all with Yahoo! Mobile.  Try it now.
  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>






      
____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  
http://tools.search.yahoo.com/newsearch/category.php?category=shopping

Reply via email to