[email protected] wrote, On 6/26/13 3:54 AM: > The users list would've been a more appropriate place to post this. > > This web search appears to give useful results: spamassassin language >
I agree that questions about how to use an existing feature of SpamAssassin, or in this case a question about whether SpamAssassin has some feature that a quick look makes it appear not to have, are better asked on the Users list. However, I want to point out that the language detection method that I helped put in to SpamAssassin many years ago, textcat, has not proven to be all that practical. This dev list would be the correct forum for discussing better ways to detect language if anyone does have any ideas. Based on what I see in the abstract, I would start by looking into Radim Řehůřek and Milan Kolkus' 2009 paper "Language Identification on the Web: Extending the Dictionary Method". The method described in their paper seems to be simple, elegant, and a logical improvement over Textcat. However I haven't tried it yet. Has anyone on the list had experience with it? I see that there is an online implementation available to play with at http://mlcomp.org/programs/633 but don't see much mention of it besides that.
