I don’t know is Solr/Lucene does this, but it can be helpful to start with the 
shortest posting list (rarest term). Sometimes you can short-circuit evaluation 
before reading the long lists. The Infoseek Ultra engine did that. With 
all-terms (default AND), you can sometimes get to no-hits pretty early.

wunder
Walter Underwood
[email protected]
http://observer.wunderwood.org/  (my blog)


> On Sep 4, 2016, at 10:01 AM, Doug Turnbull 
> <[email protected]> wrote:
> 
> I see it more of a performance tweak than a relevance thing. matches on 
> stopwords introduce the potential for many more documents to be scored. 
> 
> Large collections usually should have a high min-should-match, so more than 
> likely queries with at least one or two non-stopwords that dramatically limit 
> the docs that will be scored. And since large collections are where people 
> have stopwords perf problems, this tends to obviate the performance gains of 
> removing stopwords.
> 
> On Sun, Sep 4, 2016 at 12:08 PM Erick Erickson <[email protected] 
> <mailto:[email protected]>> wrote:
> Wouldn't most frequent term serve?
> 
> 
> On Sep 4, 2016 08:52, "Alexandre Rafalovitch" <[email protected] 
> <mailto:[email protected]>> wrote:
> On 4 September 2016 at 22:23, Walter Underwood <[email protected] 
> <mailto:[email protected]>> wrote:
> > If you do want to use stopwords, I’d index without them, then look at the
> > words with the lowest IDF to make the list.
> 
> That's an interesting approach. Is there an easy way to do that (in Solr?)
> 
> Regards,
>    Alex.
> 
> ----
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/ <http://www.solr-start.com/>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] 
> <mailto:[email protected]>
> For additional commands, e-mail: [email protected] 
> <mailto:[email protected]>
> 

Reply via email to