"the lucene module requires users to decide at indexing time what and how to facet whereas Solr does everything at searching time"

It would be nice to have some confirmation/clarification of that - Are Lucene facets "static" in some/any sense? What decisions does an app developer need to make upfront and can only be changed with a full reindex of the data?

I'm trying to get a handle on whether Lucene Facets is a guru-level feature or something that an average Lucene user can trivially master with say 5 minutes of reading. Or is it the kind of feature that is mainly of interest to the developers of higher-level search platforms such as Solr and ElasticSearch as opposed to the users of those platforms?

-- Jack Krupansky

-----Original Message----- From: Adrien Grand
Sent: Thursday, December 13, 2012 7:03 AM
To: dev@lucene.apache.org
Subject: Re: Solr faceting vs. Lucene faceting

Hi Shai,

On Thu, Dec 13, 2012 at 12:21 PM, Shai Erera <ser...@gmail.com> wrote:
As I said, if someone volunteers to do some work on the Solr side, I will
gladly participate in that effort.
I just don't even know where to start w/ Solr :).

The entry point for Solr facets is
org.apache.solr.request.SimpleFacets.getFacetCounts (called from
FacetComponent).

One thing that would be really great is if we can build an adapter (I think
someone mentioned that word here)
which supports basic facets capabilities, so that we can at least benchmark
Solr's current
implementation vs the implementation w/ the module.

Comparing both impls would be great but an adapter might be hard to
write given how Lucene faceting differs from Solr faceting: the lucene
module requires users to decide at indexing time what and how to facet
whereas Solr does everything at searching time (there is even an issue
open in order to be able to compute facet counts based on arbitray
functions [1]) using FieldCache and UninvertedField (meaning that you
can compute facets on any field that is indexed). So Lucene faceting
would probably require an additional field property in the schema to
let Solr know that it should add category paths to documents? (Please
correct me if anything I wrote here is wrong).

I have a few questions regarding the faceting module:
- do you have any rough idea of how speed and memory usage vary
depending on the number of docs to collect, distinct field values,
etc. ?
- TaxonomyReader seems to use ints as ordinals for category paths,
does it mean that the faceting module can't handle paths that have
more than 2B distinct values? Is it fixable? (Or maybe it doesn't make
sense to handle such large numbers of distinct values?)

[1] https://issues.apache.org/jira/browse/SOLR-1581

--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to