Has anyone already given some thought into refining the solr stopwords.txt for
library collections, particularly finding aids? The words included in the out
of the box stopwords.txt are of very questionable unimportance:
<an and are as at be but by for if in into is it not of on or s such t that the
their then there these they this to was will with>
We were indexing a field id with "no." as one of its tokens (for number), but
wanted a query with "no" (where the person did not add the period) to find the
doc, but in actuality the "no" would get stripped by the StopFilterFactory. And
thus we stumbled upon this list, and was a bit suprised by some of the
inclusions (ex:"will"), and exclusions( ex:"a").
Thanks,
Eric James
Yale University Libraries