I can argue both ways as usual. Stopwords may have started as a way to help deal with limited space/memory, but are things really any different now? We just shove more and more data into the system and still have hardware constraints to deal with that can be helped by squeezing out stopwords.
OTOH, how much time and energy do we spend trying to support them? Hmmm, maybe the right thing to do is reconsider how they work. It seems like the pain of supporting them is a consequence of them being a filter, then we get into whether to preserve pos info and the like. Would it be easier if we thought of them as pre-processing before any analysis chain even saw them? It sure would be easier to explain as "it's as if they never existed" than the present "it depends". This would certainly change behavior though.... On Aug 29, 2016 18:36, "Walter Underwood" <[email protected]> wrote: > I’ve never removed stopwords and I started working on search in 1996 at > Infoseek. > > wunder > Walter Underwood > [email protected] > http://observer.wunderwood.org/ (my blog) > > On Aug 29, 2016, at 6:32 PM, Alexandre Rafalovitch <[email protected]> > wrote: > > On 30 August 2016 at 08:18, Walter Underwood <[email protected]> > wrote (on Solr users list): > > Stop word removal is a hack left over from when we were running search > engines in 64 kbytes of memory. > > > If this is a leftover hack, should we start removing it from the > official examples? > > Or do they still have value even with latest ranking algorithms? > > Regards, > Alex. > ---- > Newsletter and resources for Solr beginners and intermediates: > http://www.solr-start.com/ > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > >
