[jira] [Commented] (LUCENE-3079) Facetiing module

Toke Eskildsen (JIRA) Mon, 27 Jun 2011 04:41:16 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055480#comment-13055480
 ]


Toke Eskildsen commented on LUCENE-3079:
----------------------------------------

SOLR-2412/LUCENE-2369 were created with the trade-offs (relatively) long 
startup, low memory, high performance: When the index is (re)opened, the 
hierarchy is analyzed by iterating the terms (it could be offloaded to 
index-time, but it is still iterate-the-entire-term-list after each change). 
This does not play well with real-time, but should be a nice fit for large 
indexes with low update rate.

As for speed, my theory is that the sparser hierarchy (only the concrete paths) 
wins due to less counting, but without another solution to compare against it 
has so far remained a theory. There are some measurements at 
https://sbdevel.wordpress.com/2010/10/11/hierarchical-faceting/ but I find that 
for hierarchical faceting, small changes to test-setups can easily have vast 
implications on performance, so they are not comparable to your 
million-document test.

> Facetiing module
> ----------------
>
>                 Key: LUCENE-3079
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3079
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: LUCENE-3079.patch
>
>
> Faceting is a hugely important feature, available in Solr today but
> not [easily] usable by Lucene-only apps.
> We should fix this, by creating a shared faceting module.
> Ideally, we factor out Solr's faceting impl, and maybe poach/merge
> from other impls (eg Bobo browse).
> Hoss describes some important challenges we'll face in doing this
> (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
> {noformat}
> To look at "faceting" as a concrete example, there are big the reasons 
> faceting works so well in Solr: Solr has total control over the 
> index, knows exactly when the index has changed to rebuild caches, has a 
> strict schema so it can make sense of field types and 
> pick faceting algos accordingly, has multi-phase distributed search 
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements 
> that can be made to take even more advantage of knowledge solr has because 
> it "owns" the index that we no one has had time to tackle)
> {noformat}
> This is a great list of the things we face in refactoring.  It's also
> important because, if Solr needed to be so deeply intertwined with
> caching, schema, etc., other apps that want to facet will have the
> same "needs" and so we really have to address them in creating the
> shared module.
> I think we should get a basic faceting module started, but should not
> cut Solr over at first.  We should iterate on the module, fold in
> improvements, etc., and then, once we can fully verify that cutting
> over doesn't hurt Solr (ie lose functionality or performance) we can
> later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-3079) Facetiing module

Reply via email to