Lucene does this -- there is a cost() API on the Scorer that typically is the doc freq of the word, and furthermore things like BooleanQuery navigate by the lowest cost Scorer first, leapfrogging the others after.
On Sun, Sep 4, 2016 at 10:36 PM Walter Underwood <[email protected]> wrote: > I don’t know is Solr/Lucene does this, but it can be helpful to start with > the shortest posting list (rarest term). Sometimes you can short-circuit > evaluation before reading the long lists. The Infoseek Ultra engine did > that. With all-terms (default AND), you can sometimes get to no-hits pretty > early. > > wunder > Walter Underwood > [email protected] > http://observer.wunderwood.org/ (my blog) > > > On Sep 4, 2016, at 10:01 AM, Doug Turnbull < > [email protected]> wrote: > > I see it more of a performance tweak than a relevance thing. matches on > stopwords introduce the potential for many more documents to be scored. > > Large collections usually should have a high min-should-match, so more > than likely queries with at least one or two non-stopwords that > dramatically limit the docs that will be scored. And since large > collections are where people have stopwords perf problems, this tends to > obviate the performance gains of removing stopwords. > > On Sun, Sep 4, 2016 at 12:08 PM Erick Erickson <[email protected]> > wrote: > >> Wouldn't most frequent term serve? >> >> On Sep 4, 2016 08:52, "Alexandre Rafalovitch" <[email protected]> wrote: >> >>> On 4 September 2016 at 22:23, Walter Underwood <[email protected]> >>> wrote: >>> > If you do want to use stopwords, I’d index without them, then look at >>> the >>> > words with the lowest IDF to make the list. >>> >>> That's an interesting approach. Is there an easy way to do that (in >>> Solr?) >>> >>> Regards, >>> Alex. >>> >>> ---- >>> Newsletter and resources for Solr beginners and intermediates: >>> http://www.solr-start.com/ >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> > -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
