-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Simon -- thanks, I think you're right on the money there, in all respects except for the venue. could you open this on the bugzilla? bugzilla.SpamAssassin.org . cheers, - --j. Simon McCorkindale writes: > Platform: FreeBSD 5.4-RC3 > Perl: 5.8.6 > SpamAssassin: 3.0.4 > > I'm a volunteer for the www.rbl.jp project and I think I've come across > a bug in SA. I searched for any previous posts of this bug but couldn't > find anything. I know this isn't the right place to post bugs but I want > to discuss my attempts to fix it. > > The problem is when some Japanese characters from the JIS character set > immediately follow a URI then the URI is not detected properly. > > The URL I used for testing is listed in our url.rbl.jp black list and > numerous others. It is http://www.j-*sine.com but with the * removed > (just to make sure this mail gets through the mailing list :-) > > If there are any JIS characters immediately following the m at the end > if j-sine.com then what is extracted will be the http://www.j-*sine.com > plus a chunk of the JIS characters. > > Hence, when SpamAssassin queries url.rbl.jp to see if this URL is > registered it gets a not-registered reply. > > I had a hunt through the Perl code and did many test simulations and > managed to track the source of the problem down to PerMsgStatus.pm. > Between lines 1733 and 1745 of this file the regular expressions for > detecting URIs are defined. I'm not a wizard on regular expressions so a > lot of it's over the top for me. > > Using my old friend od I tracked the culprit JIS character down. It > seems to be the ESC (hex 1B) character. I don't know much about JIS but > I'm guessing this is used to define the start of a string of JIS > characters. > > On line 1735 of PerMsgStatus.pm there is the line: > > my $unreserved = "A-Za-z0-9\Q$mark#\E\x00-\x08\x0b\x0c\x0e-\x1f"; > > so I modified it to: > > my $unreserved = "A-Za-z0-9\Q$mark#\E\x00-\x08\x0b\x0c\x0e-\x1a\x1c- > \x1f"; > > so that \x1b isn't included and this seems to have solved the problem. > > I think this is an ugly hack and probably breaking other stuff/going > against certain rules etc but I would like to hear anybody's ideas on > this dilemma. > > Thanks in advance, > Simon. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFC9+t1MJF5cimLx9ARAlvVAJ91JisPDNnVTXD3/wS60+sSkxON+QCgki0v pUMBbtaLdp174ZvFMrRvQ+4= =PNIt -----END PGP SIGNATURE-----
