On Thursday, December 17, 2009, Sidney Markowitz <[email protected]> wrote:
> Henrik K wrote, On 17/12/09 7:37 PM:
>
> Justin are you the only one who knows about TextCat? Have you looked at it?
>
>
> I was involved with it when we first ported it to SpamAssassin, but its been 
> years since I looked at it. I think that I may be the person most familiar 
> with it, though. I'm afraid that I didn't notice that bug in the database.
>
> Uppercase characters are a tricky problem that had not occurred to me. If 
> textcat is going to recognize languages in multibyte charsets without trying 
> to do any kind of charset decoding, then it can't lowercase all the 
> characters as if it is assuming that they are Roman ASCII. Unless we train it 
> on all-uppercase English as a separate language, it won't recognize it as the 
> English that it trained on.
>
> I guess any more comments on the bug itself ought to be placed in Bugzilla. 
> This mailing list is a fair place to discuss whether it should be considered 
> a blocker for 3.3.0. Personally, I don't think it is. It may be the case that 
> it is a deficiency in using Texcat in SpamAssassin, but it is one that it has 
> always had, among others. It would be good if someone came up with a way for 
> it to be smarter about charsets, but I don't think that can happen in the 
> 3.3.0 time frame.

+1
>
>  -- sidney
>
>

-- 
--j.

Reply via email to