This is an automated email from the ASF dual-hosted git repository.

houston pushed a commit to branch branch_9x
in repository https://gitbox.apache.org/repos/asf/solr.git


The following commit(s) were added to refs/heads/branch_9x by this push:
     new 006e3a85397 SOLR-17346: Synchronise stopwords from snowball with those 
in lucene (#2533)
006e3a85397 is described below

commit 006e3a85397d8bc1c95f5633c37d765b8685824e
Author: Alastair Porter <[email protected]>
AuthorDate: Thu Jul 11 20:52:41 2024 +0200

    SOLR-17346: Synchronise stopwords from snowball with those in lucene (#2533)
    
    (cherry picked from commit 991e76171e489e5f655d2dda7b0cab40177e5e57)
---
 solr/CHANGES.txt                                   |  2 ++
 .../configsets/_default/conf/lang/stopwords_da.txt |  8 +++---
 .../configsets/_default/conf/lang/stopwords_de.txt |  6 ++---
 .../configsets/_default/conf/lang/stopwords_es.txt |  6 ++---
 .../configsets/_default/conf/lang/stopwords_fi.txt | 13 +++++-----
 .../configsets/_default/conf/lang/stopwords_fr.txt | 30 +++++++++++-----------
 .../configsets/_default/conf/lang/stopwords_hu.txt |  8 +++---
 .../configsets/_default/conf/lang/stopwords_it.txt |  6 ++---
 .../configsets/_default/conf/lang/stopwords_nl.txt |  8 +++---
 .../configsets/_default/conf/lang/stopwords_no.txt | 12 +++------
 .../configsets/_default/conf/lang/stopwords_pt.txt |  6 ++---
 .../configsets/_default/conf/lang/stopwords_ru.txt |  7 ++---
 .../configsets/_default/conf/lang/stopwords_sv.txt |  8 +++---
 13 files changed, 60 insertions(+), 60 deletions(-)

diff --git a/solr/CHANGES.txt b/solr/CHANGES.txt
index 1ff0313de8c..8101a9cda27 100644
--- a/solr/CHANGES.txt
+++ b/solr/CHANGES.txt
@@ -44,6 +44,8 @@ Improvements
 
 * SOLR-15591: Make using debugger in Solr easier by avoiding NPE in 
ExternalPaths.determineSourceHome.  (@charlygrappa via Eric Pugh)
 
+* SOLR-17346: Synchronise stopwords from snowball with those in Lucene 
(Alastair Porter via Houston Putman)
+
 Optimizations
 ---------------------
 * SOLR-17257: Both Minimize Cores and the Affinity replica placement 
strategies would over-gather
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_da.txt 
b/solr/server/solr/configsets/_default/conf/lang/stopwords_da.txt
index 42e6145b98e..6e90e8f1aae 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_da.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_da.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/danish/stop.txt
+ | From https://snowballstem.org/algorithms/danish/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
@@ -60,7 +60,7 @@ hvor         | where
 eller        | or
 hvad         | what
 skal         | must/shall etc.
-selv         | myself/youself/herself/ourselves etc., even
+selv         | myself/yourself/herself/ourselves etc., even
 her          | here
 alle         | all/everyone/everybody etc.
 vil          | will (verb)
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_de.txt 
b/solr/server/solr/configsets/_default/conf/lang/stopwords_de.txt
index 86525e7ae08..804bbbdb010 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_de.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_de.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/german/stop.txt
+ | From https://snowballstem.org/algorithms/german/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_es.txt 
b/solr/server/solr/configsets/_default/conf/lang/stopwords_es.txt
index 487d78c8d56..48bd65ef867 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_es.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_es.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/spanish/stop.txt
+ | From https://snowballstem.org/algorithms/spanish/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_fi.txt 
b/solr/server/solr/configsets/_default/conf/lang/stopwords_fi.txt
index 4372c9a055b..c9ee2f16dc5 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_fi.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_fi.txt
@@ -1,12 +1,12 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/finnish/stop.txt
+ | From https://snowballstem.org/algorithms/finnish/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
  | NOTE: To use this file with StopFilterFactory, you must specify 
format="snowball"
- 
+
 | forms of BE
 
 olla
@@ -48,8 +48,8 @@ me     meidän meidät meitä  meissä  meistä  meihin meillä  
meiltä  meille
 te     teidän teidät teitä  teissä  teistä  teihin teillä  teiltä  teille      
          | you
 he     heidän heidät heitä  heissä  heistä  heihin heillä  heiltä  heille      
          | they
 
-tämä   tämän         tätä   tässä   tästä   tähän  tallä   tältä   tälle   
tänä   täksi  | this
-tuo    tuon          tuotä  tuossa  tuosta  tuohon tuolla  tuolta  tuolle  
tuona  tuoksi | that
+tämä   tämän         tätä   tässä   tästä   tähän  tällä   tältä   tälle   
tänä   täksi  | this
+tuo    tuon          tuota  tuossa  tuosta  tuohon tuolla  tuolta  tuolle  
tuona  tuoksi | that
 se     sen           sitä   siinä   siitä   siihen sillä   siltä   sille   
sinä   siksi  | it
 nämä   näiden        näitä  näissä  näistä  näihin näillä  näiltä  näille  
näinä  näiksi | these
 nuo    noiden        noita  noissa  noista  noihin noilla  noilta  noille  
noina  noiksi | those
@@ -91,7 +91,6 @@ yli     | over, across
 | other
 
 kun    | when
-niin   | so
 nyt    | now
 itse   | self
 
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_fr.txt 
b/solr/server/solr/configsets/_default/conf/lang/stopwords_fr.txt
index 749abae6846..658ae9c91ac 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_fr.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_fr.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/french/stop.txt
+ | From https://snowballstem.org/algorithms/french/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
@@ -51,7 +51,7 @@ qui            |  who
 sa             |  his, her (fem)
 se             |  oneself
 ses            |  his (pl)
-son            |  his, her (masc)
+ | son            |  his, her (masc). Omitted because it is homonym of "sound"
 sur            |  on
 ta             |  thy (fem)
 te             |  thee
@@ -79,15 +79,15 @@ t              |  t'
 y              |  there
 
                | forms of être (not including the infinitive):
-été
+ | été - Omitted because it is homonym of "summer"
 étée
 étées
-étés
+ | étés - Omitted because it is homonym of "summers"
 étant
 suis
 es
-est
-sommes
+ | est - Omitted because it is homonym of "east"
+ | sommes - Omitted because it is homonym of "sums"
 êtes
 sont
 serai
@@ -118,7 +118,7 @@ soyez
 soient
 fusse
 fusses
-fût
+ | fût - Omitted because it is homonym of "tap", like in "beer on tap"
 fussions
 fussiez
 fussent
@@ -130,13 +130,13 @@ eue
 eues
 eus
 ai
-as
+ | as - Omitted because it is homonym of "ace"
 avons
 avez
 ont
 aurai
-auras
-aura
+ | auras - Omitted because it is also the name of a kind of wind
+ | aura - Omitted because it is also the name of a kind of wind and homonym of 
"aura"
 aurons
 aurez
 auront
@@ -147,7 +147,7 @@ auriez
 auraient
 avais
 avait
-avions
+ | avions - Omitted because it is homonym of "planes"
 aviez
 avaient
 eut
@@ -169,8 +169,8 @@ eussent
 
                | Later additions (from Jean-Christophe Deschamps)
 ceci           |  this
-cela           |  that
-celà           |  that
+cela           |  that (added 11 Apr 2012. Omission reported by Adrien Grand)
+celà           |  that (incorrect, though common)
 cet            |  this
 cette          |  this
 ici            |  here
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_hu.txt 
b/solr/server/solr/configsets/_default/conf/lang/stopwords_hu.txt
index 37526da8aa9..3fa279eac91 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_hu.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_hu.txt
@@ -1,12 +1,12 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/hungarian/stop.txt
+ | From https://snowballstem.org/algorithms/hungarian/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
  | NOTE: To use this file with StopFilterFactory, you must specify 
format="snowball"
- 
+
 | Hungarian stop word list
 | prepared by Anna Tordai
 
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_it.txt 
b/solr/server/solr/configsets/_default/conf/lang/stopwords_it.txt
index 1219cc773ab..c74160e28ca 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_it.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_it.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/italian/stop.txt
+ | From https://snowballstem.org/algorithms/italian/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_nl.txt 
b/solr/server/solr/configsets/_default/conf/lang/stopwords_nl.txt
index 47a2aeacf6f..48c5515123a 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_nl.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_nl.txt
@@ -1,12 +1,13 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/dutch/stop.txt
+ | From https://snowballstem.org/algorithms/dutch/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
  | NOTE: To use this file with StopFilterFactory, you must specify 
format="snowball"
 
+
  | A Dutch stop word list. Comments begin with vertical bar. Each stop
  | word is at the start of a line.
 
@@ -117,3 +118,4 @@ uw             |  your
 iemand         |  somebody
 geweest        |  been; past participle of 'be'
 andere         |  other
+
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_no.txt 
b/solr/server/solr/configsets/_default/conf/lang/stopwords_no.txt
index a7a2c28ba54..f427609484f 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_no.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_no.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/norwegian/stop.txt
+ | From https://snowballstem.org/algorithms/norwegian/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
@@ -25,7 +25,7 @@ et             | a/an
 den            | it/this/that
 til            | to
 er             | is/am/are
-som            | who/that
+som            | who/which/that
 på             | on
 de             | they / you(formal)
 med            | with
@@ -84,7 +84,6 @@ noen           | some
 noe            | some
 ville          | would
 dere           | you
-som            | who/which/that
 deres          | their/theirs
 kun            | only/just
 ja             | yes
@@ -129,7 +128,6 @@ mange          | many
 også           | also
 slik           | just
 vært           | been
-være           | to be
 båe            | both *
 begge          | both
 siden          | since
@@ -155,7 +153,6 @@ hennar         | her/hers
 hennes         | hers
 hoss           | how *
 hossen         | how *
-ikkje          | not *
 ingi           | noone *
 inkje          | noone *
 korleis        | how *
@@ -177,7 +174,6 @@ noka           | some (fem.) *
 nokor          | some *
 noko           | some *
 nokre          | some *
-si             | his/hers *
 sia            | since *
 sidan          | since *
 so             | so *
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_pt.txt 
b/solr/server/solr/configsets/_default/conf/lang/stopwords_pt.txt
index acfeb01af6b..d03d7f234d5 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_pt.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_pt.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/portuguese/stop.txt
+ | From https://snowballstem.org/algorithms/portuguese/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_ru.txt 
b/solr/server/solr/configsets/_default/conf/lang/stopwords_ru.txt
index 55271400c64..65512d49dbd 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_ru.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_ru.txt
@@ -1,12 +1,13 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/russian/stop.txt
+ | From https://snowballstem.org/algorithms/russian/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
  | NOTE: To use this file with StopFilterFactory, you must specify 
format="snowball"
 
+
  | a russian stop word list. comments begin with vertical bar. each stop
  | word is at the start of a line.
 
diff --git a/solr/server/solr/configsets/_default/conf/lang/stopwords_sv.txt 
b/solr/server/solr/configsets/_default/conf/lang/stopwords_sv.txt
index 096f87f6766..d1d0d100880 100644
--- a/solr/server/solr/configsets/_default/conf/lang/stopwords_sv.txt
+++ b/solr/server/solr/configsets/_default/conf/lang/stopwords_sv.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/swedish/stop.txt
+ | From https://snowballstem.org/algorithms/swedish/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
@@ -120,7 +120,7 @@ vilka          | who, that
 ditt           | thy
 vem            | who
 vilket         | who, that
-sitta          | his
+sitt           | his
 sådana         | such a
 vart           | each
 dina           | thy

Reply via email to