-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Simon --

thanks, I think you're right on the money there, in all respects
except for the venue.  could you open this on the bugzilla?
bugzilla.SpamAssassin.org .

cheers,

- --j.

Simon McCorkindale writes:
> Platform: FreeBSD 5.4-RC3
> Perl: 5.8.6
> SpamAssassin: 3.0.4
> 
> I'm a volunteer for the www.rbl.jp project and I think I've come across
> a bug in SA. I searched for any previous posts of this bug but couldn't
> find anything. I know this isn't the right place to post bugs but I want
> to discuss my attempts to fix it.
> 
> The problem is when some Japanese characters from the JIS character set
> immediately follow a URI then the URI is not detected properly.
> 
> The URL I used for testing is listed in our url.rbl.jp black list and
> numerous others. It is http://www.j-*sine.com but with the * removed
> (just to make sure this mail gets through the mailing list :-)
> 
> If there are any JIS characters immediately following the m at the end
> if j-sine.com then what is extracted will be the http://www.j-*sine.com
> plus a chunk of the JIS characters.
> 
> Hence, when SpamAssassin queries url.rbl.jp to see if this URL is
> registered it gets a not-registered reply.
> 
> I had a hunt through the Perl code and did many test simulations and
> managed to track the source of the problem down to PerMsgStatus.pm.
> Between lines 1733 and 1745 of this file the regular expressions for
> detecting URIs are defined. I'm not a wizard on regular expressions so a
> lot of it's over the top for me.
> 
> Using my old friend od I tracked the culprit JIS character down. It
> seems to be the ESC (hex 1B) character. I don't know much about JIS but
> I'm guessing this is used to define the start of a string of JIS
> characters.
> 
> On line 1735 of PerMsgStatus.pm there is the line:
> 
> my $unreserved = "A-Za-z0-9\Q$mark#\E\x00-\x08\x0b\x0c\x0e-\x1f";
> 
> so I modified it to:
> 
> my $unreserved = "A-Za-z0-9\Q$mark#\E\x00-\x08\x0b\x0c\x0e-\x1a\x1c-
> \x1f";
> 
> so that \x1b isn't included and this seems to have solved the problem.
> 
> I think this is an ugly hack and probably breaking other stuff/going
> against certain rules etc but I would like to hear anybody's ideas on
> this dilemma.
> 
> Thanks in advance,
> Simon.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFC9+t1MJF5cimLx9ARAlvVAJ91JisPDNnVTXD3/wS60+sSkxON+QCgki0v
pUMBbtaLdp174ZvFMrRvQ+4=
=PNIt
-----END PGP SIGNATURE-----

Reply via email to