On Tue, 11 Jun 2019, 17:41 RW, <[email protected]> wrote:

> On Tue, 11 Jun 2019 13:43:35 +0300
> Henrik K wrote:
>
>
> > Does the current stoplist actually do anything useful?  Someone
> > should try 10-fold cross validation with and without..
>
> My understanding is that it was intended purely as a speed-up.

Speedup plus less storage was the reason for removing stop words.

> The words
> are chosen to be neutral tokens that wont affect the final result.
>
These words will result in neutral tokens only when the user's primary
language is English. For a Spanish user, an English mail is highly likely
to be a spam, hence we shouldn't remove stop words in this case.

>

Reply via email to