Re: Category count.

Erik Hatcher Mon, 09 Nov 2009 07:10:05 -0800


On Nov 9, 2009, at 9:55 AM, André Maldonado wrote:

Solr has a XML API, correct? So it can be used with .net.

Yes, documents can come in as XML (<doc><field>value</field>...</doc>), or as CSV, or as rich documents (like PDF, Word, etc) over HTTP.

Several clients of ours are using Solr from .NET environments. One ofthem is even interestingly combining Solr as part of a SQL Serverquery using an extension point.


        Erik

Or I'm wrong?

Thank's
"Então aproximaram-se os que estavam no barco, e adoraram-no,dizendo: És
verdadeiramente o Filho de Deus." (Mateus 14:33)
On Mon, Nov 9, 2009 at 12:14, Erik Hatcher <erik.hatc...@gmail.com>wrote:
Note that Solr has faceted built-in, and uses Lucene's goodnesstoo. And
it scales quite well.

      Erik



On Nov 9, 2009, at 8:12 AM, Moray McConnachie wrote:

This is basically Lucene for faceted search I think?
Most approaches I have seen to this involve caching results and/or
duplicating the facet information in an alternate data store.
The best resource I have seen using caching results. It permitsyou todrill down into multiple facets and get the no. of documents perfacetupdated easily without going back to the Lucene engine multiplequeries.
http://www.devatwork.nl/index.php/articles/lucenenet/faceted-search-and-drill-down-lucenenet/
1) at initialisation (and/or at set points) step through all thepotential
facet values and store the matching results in some kind of cached
dictionary of bit arrays
2) the user drills down into whatever facets
3) you AND together the bit arrays representing each facet theuser is in4) You count the number of positive bits in the resulting bitarray to get
the number of articles matched.
At 3) you could clearly AND this together with any other Luceneresult setto get accurate counts when you are integrating facets and non-faceted
search results.
The approach works best the higher the ratio of queries to updates- it
will work poorly for applications with any or all of

a) very frequent updating
b) the need for facets to be 100% accurate in real time
c) a large number of potential facet values (initialisation couldbe very
slow)
With a little extra work on the indexing end you could conquer a)and b)
and hopefully get round the need to reinitialise from scratch.

I'm not sure how well it would work with very large datasets either,
particularly where the number of matches in some facet is verylarge - I've
never had to work with bit arrays of millions of bits!

I like this approach because it is a 100% lucene solution and it is
(relatively) fast compared to your approach so far and other similar
approaches.
Faceting is such a common meme for search, I can foresee someoneporting
faceting functionality into the back end if indeed it is not already
happening?

Yours,
Moray


-------------------------------------
Moray McConnachie
Director of IT    +44 1865 261 600
Oxford Analytica  http://www.oxan.com

-----Original Message-----
From: André Maldonado [mailto:andre.maldon...@gmail.com]
Sent: 09 November 2009 12:44
To: lucene-net-user@incubator.apache.org
Subject: Category count.
Hy all. I have a problem that is exactly like this (that was wrotefrom
another developer)
"I am trying to use Lucene Java 2.3.2 to implement search on acatalog ofproducts. Apart from the regular fields for a product, there isfield called'Category'. A product can fall in multiple categories. Currently,I useFilteredQuery to search for the same search term with everyCategory to get
the number of results per category.

This results in 20-30 internal search calls per query to display the
results. This is slowing down the search considerably. Is there afaster way
of achieving the same result using Lucene?"
But in the thread that I found this question, I didn't found anygood
solution.

Can you help me?

Thank's
"Então aproximaram-se os que estavam no barco, e adoraram-no,dizendo: És
verdadeiramente o Filho de Deus." (Mateus 14:33)

Re: Category count.

Reply via email to