I definitely think it should be configurable if you can open a ticket in
bugzilla.
--
Kevin A. McGrail
Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


On Tue, Jun 11, 2019 at 8:26 AM Shreyansh Shrivastava. <
[email protected]> wrote:

>
>
> On Tue, 11 Jun 2019, 17:41 RW, <[email protected]> wrote:
>
>> On Tue, 11 Jun 2019 13:43:35 +0300
>> Henrik K wrote:
>>
>>
>> > Does the current stoplist actually do anything useful?  Someone
>> > should try 10-fold cross validation with and without..
>>
>> My understanding is that it was intended purely as a speed-up.
>
> Speedup plus less storage was the reason for removing stop words.
>
>> The words
>> are chosen to be neutral tokens that wont affect the final result.
>>
> These words will result in neutral tokens only when the user's primary
> language is English. For a Spanish user, an English mail is highly likely
> to be a spam, hence we shouldn't remove stop words in this case.
>
>>

Reply via email to