Interesting thoughts on the ranking previously of uris.

Another way to look at it the issue would be to take the entire email into
account.

Possiblly have this one of the last test, and use some of the other tests
as context clues.  

One of which I am thinking of is the IMAGE_ONLY test (I think that was the
name of it).  But one of the Spammy things I have seen is a lot of URLs for
images only.

Or another one maybe would be URI only.  

The most common Spam we see though (Which is a contra-postitive to this) is
the same URL over and over intermixed with some nonsensical text "A dog in
the woods like cheese" then a URL for hardcore porn.

What I am driving at, is maybe a way to limit the URI decoding is to look
at the contextual clues in the email.  So you continue looking for the
"payload" if the email shows spamminess, but cut it off at certain levels. 
For example (This may also be rule adjustable):

Score<0 : Scan X<10 URIs
0<Score<10 : Scan all URIs until the score gets above 10
Score>10 : Scan <10 URIs

That way we can limit the memory/cycle usage.



Original Message:
-----------------
From: Chris Santerre [EMAIL PROTECTED]
Date: Wed, 13 Oct 2004 09:50:06 -0400
To: [EMAIL PROTECTED], [email protected]
Subject: RE: limit on number of URIs decoded?




>-----Original Message-----
>From: Fred [mailto:[EMAIL PROTECTED]
>Sent: Wednesday, October 13, 2004 9:43 AM
>To: [email protected]
>Subject: Re: limit on number of URIs decoded?
>
>
>> Anyway to see stats on number of URLs for Ham/Spam? I'm 
>thinking more then
>> 100 URLs is rather large for an email. You say small. I must 
>be missing
>> something. Is more then 100 URLs a spam/ham sign?
>>
>> --Chris
>
>Imagine a large HTML e-mail with links and images and the 
>like.  If you have 
>images that are links, you have 2 URLs for each image.  I can 
>think of a few 
>newsletters who include a lot of urls like this.
>
>It's also possible that a message will include it's links in a 
>html part and 
>a plaintext part.
>
>What about e-mail addresses?  When people forward the same message 400 
>times, they often include the names of each person who ever 
>received a copy 
>of this forward, I think SA treat these e-mails as URLs. 


*lightbulb*

Ah, thanks. I hadn't thought about the image/link being 2, and the
forwarding of the emails. 

Thanks

--Chris

--------------------------------------------------------------------
mail2web - Check your email from the web at
http://mail2web.com/ .


Reply via email to