I definitely think it should be configurable if you can open a ticket in bugzilla. -- Kevin A. McGrail Member, Apache Software Foundation Chair Emeritus Apache SpamAssassin Project https://www.linkedin.com/in/kmcgrail - 703.798.0171
On Tue, Jun 11, 2019 at 8:26 AM Shreyansh Shrivastava. < [email protected]> wrote: > > > On Tue, 11 Jun 2019, 17:41 RW, <[email protected]> wrote: > >> On Tue, 11 Jun 2019 13:43:35 +0300 >> Henrik K wrote: >> >> >> > Does the current stoplist actually do anything useful? Someone >> > should try 10-fold cross validation with and without.. >> >> My understanding is that it was intended purely as a speed-up. > > Speedup plus less storage was the reason for removing stop words. > >> The words >> are chosen to be neutral tokens that wont affect the final result. >> > These words will result in neutral tokens only when the user's primary > language is English. For a Spanish user, an English mail is highly likely > to be a spam, hence we shouldn't remove stop words in this case. > >>
