Re: Reminder:: 3.3.0 pre-release cut: December 17th

Sidney Markowitz Thu, 17 Dec 2009 00:29:10 -0800

Henrik K wrote, On 17/12/09 7:37 PM:

Justin are you the only one who knows about TextCat? Have you looked at it?

I was involved with it when we first ported it to SpamAssassin, but itsbeen years since I looked at it. I think that I may be the person mostfamiliar with it, though. I'm afraid that I didn't notice that bug inthe database.

Uppercase characters are a tricky problem that had not occurred to me.If textcat is going to recognize languages in multibyte charsets withouttrying to do any kind of charset decoding, then it can't lowercase allthe characters as if it is assuming that they are Roman ASCII. Unless wetrain it on all-uppercase English as a separate language, it won'trecognize it as the English that it trained on.

I guess any more comments on the bug itself ought to be placed inBugzilla. This mailing list is a fair place to discuss whether it shouldbe considered a blocker for 3.3.0. Personally, I don't think it is. Itmay be the case that it is a deficiency in using Texcat in SpamAssassin,but it is one that it has always had, among others. It would be good ifsomeone came up with a way for it to be smarter about charsets, but Idon't think that can happen in the 3.3.0 time frame.


 -- sidney

Re: Reminder:: 3.3.0 pre-release cut: December 17th

Reply via email to