[jira] [Commented] (LUCENE-3079) Facetiing module

Shai Erera (JIRA) Mon, 27 Jun 2011 03:01:29 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055443#comment-13055443
 ]


Shai Erera commented on LUCENE-3079:
------------------------------------

Thanks Toke for the pointer. I think it's very interesting. We've actually 
explored in the past storing just the category/leaf, instead of the entire 
hierarchy, in the document. The search response time was much slower than what 
I reported above (nearly 2x slowdown). While storing the entire hierarchy 
indeed consumes more space, it is more performing at search time, and we figure 
that space today is cheap, and usually search apps are more interested in 
faster search response times and are willing to spend some more time at 
indexing and analysis stages.

Nevertheless, the link you provided proposes an interesting way to manage the 
hierarchy, and I think it's worth exploring at some point. Could be that it 
will perform better than how we managed it when we indexed just the leaf 
category for each document. We'd also need to see how to update the taxonomy on 
the go. For example, it describes that for A/B/C you know that its level is 3 
(that's easy) and that the previous category/tag that matches (P) is A. But 
what if at some point A/B is added to a document? What happens to the data 
indexed for the doc w/ A/B/C, which now its previous matching category is A/B? 
It's not clear to me, but could be that I've missed the description in the 
proposal.

I am very close to uploading the patch. Hopefully I'll upload it by the end of 
my day.

> Facetiing module
> ----------------
>
>                 Key: LUCENE-3079
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3079
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: LUCENE-3079.patch
>
>
> Faceting is a hugely important feature, available in Solr today but
> not [easily] usable by Lucene-only apps.
> We should fix this, by creating a shared faceting module.
> Ideally, we factor out Solr's faceting impl, and maybe poach/merge
> from other impls (eg Bobo browse).
> Hoss describes some important challenges we'll face in doing this
> (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
> {noformat}
> To look at "faceting" as a concrete example, there are big the reasons 
> faceting works so well in Solr: Solr has total control over the 
> index, knows exactly when the index has changed to rebuild caches, has a 
> strict schema so it can make sense of field types and 
> pick faceting algos accordingly, has multi-phase distributed search 
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements 
> that can be made to take even more advantage of knowledge solr has because 
> it "owns" the index that we no one has had time to tackle)
> {noformat}
> This is a great list of the things we face in refactoring.  It's also
> important because, if Solr needed to be so deeply intertwined with
> caching, schema, etc., other apps that want to facet will have the
> same "needs" and so we really have to address them in creating the
> shared module.
> I think we should get a basic faceting module started, but should not
> cut Solr over at first.  We should iterate on the module, fold in
> improvements, etc., and then, once we can fully verify that cutting
> over doesn't hurt Solr (ie lose functionality or performance) we can
> later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3079) Facetiing module

Reply via email to