Alastair Porter created SOLR-17346:
--------------------------------------
Summary: Synchronise default configset stopwords to the same list
as lucene
Key: SOLR-17346
URL: https://issues.apache.org/jira/browse/SOLR-17346
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Alastair Porter
Solr's default configset comes with a collection of sample stopwords from the
snowball project in solr/server/solr/configsets/_default/conf/lang
(https://github.com/apache/solr/tree/a42c605fb916439222a086356f368f02cf80304a/solr/server/solr/configsets/_default/conf/lang)
There is a similar list of stopwords in the lucene repository, however these
have been updated to a more recent list of snowball
([https://github.com/apache/lucene/tree/main/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball)]
Specifically, the most recent list of stopwords for the french language has
removed a number of words which are homonyms of other useful words which
shouldn't be skipped.
In a discussion on the solr-users mailing list it was agreed that it would be a
good idea to sync the list of files in solr with the ones in lucene.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]