Thanks. Now back to thinking about Lucene vs. Solr facets in Solr.

-- Jack Krupansky

From: Shai Erera 
Sent: Thursday, December 13, 2012 10:45 AM
To: dev@lucene.apache.org 
Subject: Re: Solr faceting vs. Lucene faceting

Hi Jack,

> Are Lucene facets "static" in some/any sense?

Lucene facets are not static in any way. The taxonomy is built on-the-fly, as 
documents are added to the index. You could say that it's 'discovered' as you 
add documents.
The facets come with a rich userguide: 
http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-files/userguide.html
I also wrote a few posts on it: http://shaierera.blogspot.com

> What decisions does an app developer need to make upfront

Well, as an app developer, you currently need to decide up front what facets 
your documents will have. A document may not contain all the facets, but you 
cannot say "hey, I added an Author field, now I want to facet on it". The 
reason is that in order to facet on it, the values that you put under Author 
need to be added to the taxonomy and resolved to an ordinal. Then those ords 
are written in the search index, in a way that enables very fast and efficient 
aggregations.

Also, if you're going to do more than just counting (see my first post - intro 
to facets), you're going to need to index the facets in a special way (I intend 
to write a blog about that too, w/ example code).
But I guess that's expected right? Like, you cannot add a 'price' field to the 
index as String values, and suddenly expect to be able to do efficient range 
queries on it.
As an app developer you'll recognize that when writing your app and add the 
field as a numeric field.

> and can only be changed with a full reindex of the data?

As with regular Lucene fields, if you suddenly decide to make a change to your 
taxonomy, e.g. that category A/C now needs to be under A/B/C, then yes, you 
will need to re-index the documents that were previously associated w/ A/C. But 
now that we're making progress w/ field level updated (see LUCENE-4258), 
perhaps in the future you won't need to do so.

> I'm trying to get a handle on whether Lucene Facets is a guru-level feature...

Absolutely not ! Lucene facets allow you to do very complicated things, but 
also start up w/ a faceted index in I'd say even less than 5 minutes.
Look at this post 
(http://shaierera.blogspot.com/2012/11/lucene-facets-part-2.html). You can copy 
paste the code (over current trunk) and get an impression of what it's like to 
index facets w/ Lucene.
Also, Mike McCandless and I are working on lots of simplifications now, 
including some specialized code paths for common use cases. You can follow 
LUCENE-4619.

> is it the kind of feature that is mainly of interest to the developers of 
> higher-level search platforms such as Solr and ElasticSearch as opposed to 
> the users of those platforms

Again, absolutely not! Well, it's true that in order to get the real value out 
of faceted search you need to at least have a User Interface that shows you the 
returned facets, weights etc.
But there's nothing in the module that restricts you from working with it as-is.

Hope I answered all your questions.

Shai




On Thu, Dec 13, 2012 at 4:28 PM, Jack Krupansky <j...@basetechnology.com> wrote:

  "the lucene module requires users to decide at indexing time what and how to 
facet whereas Solr does everything at searching time"


  It would be nice to have some confirmation/clarification of that - Are Lucene 
facets "static" in some/any sense? What decisions does an app developer need to 
make upfront and can only be changed with a full reindex of the data?

  I'm trying to get a handle on whether Lucene Facets is a guru-level feature 
or something that an average Lucene user can trivially master with say 5 
minutes of reading. Or is it the kind of feature that is mainly of interest to 
the developers of higher-level search platforms such as Solr and ElasticSearch 
as opposed to the users of those platforms?

  -- Jack Krupansky

  -----Original Message----- From: Adrien Grand
  Sent: Thursday, December 13, 2012 7:03 AM
  To: dev@lucene.apache.org
  Subject: Re: Solr faceting vs. Lucene faceting 


  Hi Shai,

  On Thu, Dec 13, 2012 at 12:21 PM, Shai Erera <ser...@gmail.com> wrote:

    As I said, if someone volunteers to do some work on the Solr side, I will
    gladly participate in that effort.
    I just don't even know where to start w/ Solr :).


  The entry point for Solr facets is
  org.apache.solr.request.SimpleFacets.getFacetCounts (called from
  FacetComponent).


    One thing that would be really great is if we can build an adapter (I think
    someone mentioned that word here)
    which supports basic facets capabilities, so that we can at least benchmark
    Solr's current
    implementation vs the implementation w/ the module.


  Comparing both impls would be great but an adapter might be hard to
  write given how Lucene faceting differs from Solr faceting: the lucene
  module requires users to decide at indexing time what and how to facet
  whereas Solr does everything at searching time (there is even an issue
  open in order to be able to compute facet counts based on arbitray
  functions [1]) using FieldCache and UninvertedField (meaning that you
  can compute facets on any field that is indexed). So Lucene faceting
  would probably require an additional field property in the schema to
  let Solr know that it should add category paths to documents? (Please
  correct me if anything I wrote here is wrong).

  I have a few questions regarding the faceting module:
  - do you have any rough idea of how speed and memory usage vary
  depending on the number of docs to collect, distinct field values,
  etc. ?
  - TaxonomyReader seems to use ints as ordinals for category paths,
  does it mean that the faceting module can't handle paths that have
  more than 2B distinct values? Is it fixable? (Or maybe it doesn't make
  sense to handle such large numbers of distinct values?)

  [1] https://issues.apache.org/jira/browse/SOLR-1581

  --
  Adrien

  ---------------------------------------------------------------------
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org 

  ---------------------------------------------------------------------
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org


Reply via email to