Solr has its PathHierarchyTokenFilter which can tokenise:

/books/computers/programming
into
/books
/books/computers
/books/computers/programming

You can facet on that. Of course, part of the work is done at index
time, which appears to be no different to the Lucene faceting method, at
least for hierarchical facets. It doesn't support depth prefixes at
present, but that would be trivial to add.

Upayavira

On Thu, Dec 13, 2012, at 09:44 PM, Adrien Grand wrote:
> Hi Shai,
> 
> Thanks for your answers!
> 
> On Thu, Dec 13, 2012 at 5:05 PM, Shai Erera <ser...@gmail.com> wrote:
> >> the lucene module requires users to decide at indexing time what and how
> >> to facet
> >> whereas Solr does everything at searching time
> >
> > True, that's one difference between the two implementations today, even
> > though I think that we can create a specialized path (under LUCENE-4619) for
> > really simple, non-hierarchical cases.
> > I don't know if and how Solr can handle a field value
> > Sport/Basketball/NBA/... -- i.e., how is the hierarchy broken?
> 
> Solr doesn't break hierarchies. Its closest concept is pivot faceting
> (https://issues.apache.org/jira/browse/SOLR-2894) available since 4.0
> which allows you to compute hierarchical facets on the fly. For
> example you can count brand counts per category (if both brand and
> category are indexed).
> 
> > Making a decision at search time that you'd like to facet on a field ...
> > well I think that not doing that is what allows us to do efficient faceted
> > search, off-disk or in-memory, support really large indexes and taxonomies
> > and be NRT.
> 
> Maybe it would be less efficient (or not?) butI think this kind of
> flexibility can be great for some applications (I'm thinking to
> analytics right now but there are probably many other use-cases). To
> me the main issues with Solr faceting right now are that it consumes a
> lot of memory and is not NRT-friendly because on uninversion time. But
> I think this can be fixed by using doc values (because they can be
> stored on dist and don't need to be uninverted) instead of the field
> cache. I would really love that the faceting module became flexible
> enough to be able to handle both index-time and search-time facets so
> that Solr could become a consumer of this API instead of implementing
> its own faceting logic.
> 
> > So I think that if anyone would want to really manage taxonomies of that
> > size, we'd need to discuss and maybe get back to the drawing board :).
> 
> One use-case I'm thinking of is finding the top terms of documents
> that match an arbitrary query. This can be very useful to help you
> better understand your data, but in this case the number of distinct
> values is the size of your term dictionary.
> 
> -- 
> Adrien
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to