http://bugzilla.spamassassin.org/show_bug.cgi?id=3976
------- Additional Comments From [EMAIL PROTECTED] 2005-02-23 14:37 -------
Subject: Re: [RFE] Invisible URIs should tend to be ignored
On Wed, Feb 23, 2005 at 09:11:31AM -0800, [EMAIL PROTECTED] wrote:
> > I'd favor an approach that would give slightly better odds (like 2-to-1)
> > for more visible URLs over less visible URLs, but I think a larger shift
> > would be too juicy a reward and possibly lead to a lesser disaster.
>
> In the new scheme (above), this is possible since the plugin can simply get
> the list of all URIs and then do whatever it wants based on where it came
> from. Basically:
Here's what I came up with for the new version I'm testing:
HTML is parsed and uris are stored in an array per tag type.
get_uri_list() still returns the full list as it did before.
The URIRBL plugin implements the following logic:
- Generate an array of uri arrays, in the order(*) that we want do deal with
them.
aka:
a_uris => ...,
form_uris => ...,
...
- Per ordered element, uniq the domains down to those that aren't already in
the "to be queried" list. If the uniq domains won't bring the query list
over max, just add them and goto the next ordered element. Otherwise, pick
random entries from the element until the query list equals the max, then
stop.
(*) The order right now is:
a
form
img
<everything else from html but empty a>
empty a
parsed from rendered message
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.