http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4691
------- Additional Comments From [EMAIL PROTECTED] 2006-02-12 18:32 -------
i've been running it for 2+ months on a uri scraper that i use for uribl.com.
and for about 4-5 weeks on ~1000 mtas. since i'm scraping alot of html data, i
have to use lots of rawbody rules and it comes in quite handy. Something that
I couldnt catch before was this
<html>
www.myspamuri.com
</html>
and now i can...
rawbody HTML_URI_ONLY m'<html> ?(<body> )?(www\.)?[a-z0-9\-]
{5,64}\.(com|net|info|biz) ?(</body>)? ?</html>'i
range HTML_URI_ONLY bytetrim 0:256
couple things i'll note from my previous patch. the first dbg() call in
get_range_data() causes lots of debug because its called per rule.. so that
should be removed or commented out.
also negative offsets supplied on a byte range do not work due to this line..
+ if ($args && $args =~ m/(\d+)(:(\d+))?/) {
should be
+ if ($args && $args =~ m/(\-?\d+)(:(\d+))?/) {
This makes a rule like this start to work.
body __FREEBIE_FOOTER /Home.{1,5}Disclaimer.{1,5}Privacy
Policy.{1,5}Unsubscribe/i
range __FREEBIE_FOOTER byte -256:256
i can make a new patch if necessary, but nothing else has changed. adding
range checking to full rule types should probably be added as well.
d
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.