This is an automated email from the ASF dual-hosted git repository.
houston pushed a commit to branch branch_9x
in repository https://gitbox.apache.org/repos/asf/solr.git
The following commit(s) were added to refs/heads/branch_9x by this push:
new 006e3a85397 SOLR-17346: Synchronise stopwords from snowball with those
in lucene (#2533)
006e3a85397 is described below
commit 006e3a85397d8bc1c95f5633c37d765b8685824e
Author: Alastair Porter <[email protected]>
AuthorDate: Thu Jul 11 20:52:41 2024 +0200
SOLR-17346: Synchronise stopwords from snowball with those in lucene (#2533)
(cherry picked from commit 991e76171e489e5f655d2dda7b0cab40177e5e57)
---
solr/CHANGES.txt | 2 ++
.../configsets/_default/conf/lang/stopwords_da.txt | 8 +++---
.../configsets/_default/conf/lang/stopwords_de.txt | 6 ++---
.../configsets/_default/conf/lang/stopwords_es.txt | 6 ++---
.../configsets/_default/conf/lang/stopwords_fi.txt | 13 +++++-----
.../configsets/_default/conf/lang/stopwords_fr.txt | 30 +++++++++++-----------
.../configsets/_default/conf/lang/stopwords_hu.txt | 8 +++---
.../configsets/_default/conf/lang/stopwords_it.txt | 6 ++---
.../configsets/_default/conf/lang/stopwords_nl.txt | 8 +++---
.../configsets/_default/conf/lang/stopwords_no.txt | 12 +++------
.../configsets/_default/conf/lang/stopwords_pt.txt | 6 ++---
.../configsets/_default/conf/lang/stopwords_ru.txt | 7 ++---
.../configsets/_default/conf/lang/stopwords_sv.txt | 8 +++---
13 files changed, 60 insertions(+), 60 deletions(-)
diff --git a/solr/CHANGES.txt b/solr/CHANGES.txt
index 1ff0313de8c..8101a9cda27 100644
--- a/solr/CHANGES.txt
+++ b/solr/CHANGES.txt
@@ -44,6 +44,8 @@ Improvements
* SOLR-15591: Make using debugger in Solr easier by avoiding NPE in
ExternalPaths.determineSourceHome. (@charlygrappa via Eric Pugh)
+* SOLR-17346: Synchronise stopwords from snowball with those in Lucene
(Alastair Porter via Houston Putman)
+
Optimizations
---------------------
* SOLR-17257: Both Minimize Cores and the Affinity replica placement
strategies would over-gather
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_da.txt
b/solr/server/solr/configsets/_default/conf/lang/stopwords_da.txt
index 42e6145b98e..6e90e8f1aae 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_da.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_da.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/danish/stop.txt
+ | From https://snowballstem.org/algorithms/danish/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
@@ -60,7 +60,7 @@ hvor | where
eller | or
hvad | what
skal | must/shall etc.
-selv | myself/youself/herself/ourselves etc., even
+selv | myself/yourself/herself/ourselves etc., even
her | here
alle | all/everyone/everybody etc.
vil | will (verb)
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_de.txt
b/solr/server/solr/configsets/_default/conf/lang/stopwords_de.txt
index 86525e7ae08..804bbbdb010 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_de.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_de.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/german/stop.txt
+ | From https://snowballstem.org/algorithms/german/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_es.txt
b/solr/server/solr/configsets/_default/conf/lang/stopwords_es.txt
index 487d78c8d56..48bd65ef867 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_es.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_es.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/spanish/stop.txt
+ | From https://snowballstem.org/algorithms/spanish/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_fi.txt
b/solr/server/solr/configsets/_default/conf/lang/stopwords_fi.txt
index 4372c9a055b..c9ee2f16dc5 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_fi.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_fi.txt
@@ -1,12 +1,12 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/finnish/stop.txt
+ | From https://snowballstem.org/algorithms/finnish/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
| NOTE: To use this file with StopFilterFactory, you must specify
format="snowball"
-
+
| forms of BE
olla
@@ -48,8 +48,8 @@ me meidän meidät meitä meissä meistä meihin meillä
meiltä meille
te teidän teidät teitä teissä teistä teihin teillä teiltä teille
| you
he heidän heidät heitä heissä heistä heihin heillä heiltä heille
| they
-tämä tämän tätä tässä tästä tähän tallä tältä tälle
tänä täksi | this
-tuo tuon tuotä tuossa tuosta tuohon tuolla tuolta tuolle
tuona tuoksi | that
+tämä tämän tätä tässä tästä tähän tällä tältä tälle
tänä täksi | this
+tuo tuon tuota tuossa tuosta tuohon tuolla tuolta tuolle
tuona tuoksi | that
se sen sitä siinä siitä siihen sillä siltä sille
sinä siksi | it
nämä näiden näitä näissä näistä näihin näillä näiltä näille
näinä näiksi | these
nuo noiden noita noissa noista noihin noilla noilta noille
noina noiksi | those
@@ -91,7 +91,6 @@ yli | over, across
| other
kun | when
-niin | so
nyt | now
itse | self
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_fr.txt
b/solr/server/solr/configsets/_default/conf/lang/stopwords_fr.txt
index 749abae6846..658ae9c91ac 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_fr.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_fr.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/french/stop.txt
+ | From https://snowballstem.org/algorithms/french/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
@@ -51,7 +51,7 @@ qui | who
sa | his, her (fem)
se | oneself
ses | his (pl)
-son | his, her (masc)
+ | son | his, her (masc). Omitted because it is homonym of "sound"
sur | on
ta | thy (fem)
te | thee
@@ -79,15 +79,15 @@ t | t'
y | there
| forms of être (not including the infinitive):
-été
+ | été - Omitted because it is homonym of "summer"
étée
étées
-étés
+ | étés - Omitted because it is homonym of "summers"
étant
suis
es
-est
-sommes
+ | est - Omitted because it is homonym of "east"
+ | sommes - Omitted because it is homonym of "sums"
êtes
sont
serai
@@ -118,7 +118,7 @@ soyez
soient
fusse
fusses
-fût
+ | fût - Omitted because it is homonym of "tap", like in "beer on tap"
fussions
fussiez
fussent
@@ -130,13 +130,13 @@ eue
eues
eus
ai
-as
+ | as - Omitted because it is homonym of "ace"
avons
avez
ont
aurai
-auras
-aura
+ | auras - Omitted because it is also the name of a kind of wind
+ | aura - Omitted because it is also the name of a kind of wind and homonym of
"aura"
aurons
aurez
auront
@@ -147,7 +147,7 @@ auriez
auraient
avais
avait
-avions
+ | avions - Omitted because it is homonym of "planes"
aviez
avaient
eut
@@ -169,8 +169,8 @@ eussent
| Later additions (from Jean-Christophe Deschamps)
ceci | this
-cela | that
-celà | that
+cela | that (added 11 Apr 2012. Omission reported by Adrien Grand)
+celà | that (incorrect, though common)
cet | this
cette | this
ici | here
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_hu.txt
b/solr/server/solr/configsets/_default/conf/lang/stopwords_hu.txt
index 37526da8aa9..3fa279eac91 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_hu.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_hu.txt
@@ -1,12 +1,12 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/hungarian/stop.txt
+ | From https://snowballstem.org/algorithms/hungarian/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
| NOTE: To use this file with StopFilterFactory, you must specify
format="snowball"
-
+
| Hungarian stop word list
| prepared by Anna Tordai
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_it.txt
b/solr/server/solr/configsets/_default/conf/lang/stopwords_it.txt
index 1219cc773ab..c74160e28ca 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_it.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_it.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/italian/stop.txt
+ | From https://snowballstem.org/algorithms/italian/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_nl.txt
b/solr/server/solr/configsets/_default/conf/lang/stopwords_nl.txt
index 47a2aeacf6f..48c5515123a 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_nl.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_nl.txt
@@ -1,12 +1,13 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/dutch/stop.txt
+ | From https://snowballstem.org/algorithms/dutch/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
| NOTE: To use this file with StopFilterFactory, you must specify
format="snowball"
+
| A Dutch stop word list. Comments begin with vertical bar. Each stop
| word is at the start of a line.
@@ -117,3 +118,4 @@ uw | your
iemand | somebody
geweest | been; past participle of 'be'
andere | other
+
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_no.txt
b/solr/server/solr/configsets/_default/conf/lang/stopwords_no.txt
index a7a2c28ba54..f427609484f 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_no.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_no.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/norwegian/stop.txt
+ | From https://snowballstem.org/algorithms/norwegian/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
@@ -25,7 +25,7 @@ et | a/an
den | it/this/that
til | to
er | is/am/are
-som | who/that
+som | who/which/that
på | on
de | they / you(formal)
med | with
@@ -84,7 +84,6 @@ noen | some
noe | some
ville | would
dere | you
-som | who/which/that
deres | their/theirs
kun | only/just
ja | yes
@@ -129,7 +128,6 @@ mange | many
også | also
slik | just
vært | been
-være | to be
båe | both *
begge | both
siden | since
@@ -155,7 +153,6 @@ hennar | her/hers
hennes | hers
hoss | how *
hossen | how *
-ikkje | not *
ingi | noone *
inkje | noone *
korleis | how *
@@ -177,7 +174,6 @@ noka | some (fem.) *
nokor | some *
noko | some *
nokre | some *
-si | his/hers *
sia | since *
sidan | since *
so | so *
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_pt.txt
b/solr/server/solr/configsets/_default/conf/lang/stopwords_pt.txt
index acfeb01af6b..d03d7f234d5 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_pt.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_pt.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/portuguese/stop.txt
+ | From https://snowballstem.org/algorithms/portuguese/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_ru.txt
b/solr/server/solr/configsets/_default/conf/lang/stopwords_ru.txt
index 55271400c64..65512d49dbd 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_ru.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_ru.txt
@@ -1,12 +1,13 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/russian/stop.txt
+ | From https://snowballstem.org/algorithms/russian/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
| NOTE: To use this file with StopFilterFactory, you must specify
format="snowball"
+
| a russian stop word list. comments begin with vertical bar. each stop
| word is at the start of a line.
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_sv.txt
b/solr/server/solr/configsets/_default/conf/lang/stopwords_sv.txt
index 096f87f6766..d1d0d100880 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_sv.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_sv.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/swedish/stop.txt
+ | From https://snowballstem.org/algorithms/swedish/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
@@ -120,7 +120,7 @@ vilka | who, that
ditt | thy
vem | who
vilket | who, that
-sitta | his
+sitt | his
sådana | such a
vart | each
dina | thy