I don’t know is Solr/Lucene does this, but it can be helpful to start with the shortest posting list (rarest term). Sometimes you can short-circuit evaluation before reading the long lists. The Infoseek Ultra engine did that. With all-terms (default AND), you can sometimes get to no-hits pretty early.
wunder Walter Underwood [email protected] http://observer.wunderwood.org/ (my blog) > On Sep 4, 2016, at 10:01 AM, Doug Turnbull > <[email protected]> wrote: > > I see it more of a performance tweak than a relevance thing. matches on > stopwords introduce the potential for many more documents to be scored. > > Large collections usually should have a high min-should-match, so more than > likely queries with at least one or two non-stopwords that dramatically limit > the docs that will be scored. And since large collections are where people > have stopwords perf problems, this tends to obviate the performance gains of > removing stopwords. > > On Sun, Sep 4, 2016 at 12:08 PM Erick Erickson <[email protected] > <mailto:[email protected]>> wrote: > Wouldn't most frequent term serve? > > > On Sep 4, 2016 08:52, "Alexandre Rafalovitch" <[email protected] > <mailto:[email protected]>> wrote: > On 4 September 2016 at 22:23, Walter Underwood <[email protected] > <mailto:[email protected]>> wrote: > > If you do want to use stopwords, I’d index without them, then look at the > > words with the lowest IDF to make the list. > > That's an interesting approach. Is there an easy way to do that (in Solr?) > > Regards, > Alex. > > ---- > Newsletter and resources for Solr beginners and intermediates: > http://www.solr-start.com/ <http://www.solr-start.com/> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > <mailto:[email protected]> > For additional commands, e-mail: [email protected] > <mailto:[email protected]> >
