Title: Message
Your basic theory is sound, however it is not quite that cut-and-dry in practice. The majority of the rules we code into Message Sniffer are based on the premise of "attacking the redirection"... that is, filter on where "they" want us to go. This makes these rules very effective if they are coded properly.
 
However, the spammers go out of their way to thwart this kind of filtering by obfuscating domains, phone numbers, and anything else they can to hide the information. (such as hiding phone numbers in images) This places a limit on effectiveness, but doesn't eliminate it since the costs of generating successful obfuscations are very high...
 
Another limiting factor for spammers' obfuscation techniques is that the techniques themselves are often easier to filter than the individual targets -
 
For example, the Russian spammers this week shot themselves in the foot when they started to encode their phone numbers in a vertical column on the side of their email messages. The intent was to obfuscate their phone numbers with a lot of embeded HTML code. Whereas previously we had to capture each new Russian spam by tagging the unique phone number of each spam, now we can capture huge swaths of Russian spam by coding for the obfuscation technique (and it's variants).
 
--- Regarding the proposal of sending domains, links, and phone numbers to a central place to coordinate them... We have quite a bit of experience with this. I can tell you that a significant amount of manual effort will be required to "edit" any list of candidates, and some significant effort will also be required to extract candidates in the face of obfuscation... We have a good deal of automation helping us with this effort - that is how we can afford to do it for the low price we charge... I doubt it could be done for free, but it's worth a try since the results can be significant even if they are not complete.
 
The other thing I can tell you about this is that you will _NEED_ to have a contextual reference for the candidates that you send to the central coordinator. It is nearly impossible to determine which candidates should be "filtered" without actually studying the email that contained them (even if you have some very powerful AI and a strong corpus to test with).
 
A case in point... spammers have recently begun to heavily lace spam with visible and invisible references to legitimate URLs and email addresses so that automated filtering systems will have trouble... I believe this is partially a response to Bayesian (Grahamian?) techniques under the theory that a legitimate email address, url, and domain would be statistically weighted to the ham corpus on a wide group...
 
I hope this info is helpful,
_M
 
Pete McNeil (Madscientist)
President, MicroNeil Research Corporation
Chief SortMonster (www.sortmonster.com)
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kami Razvan
Sent: Friday, July 11, 2003 9:05 AM
To: [EMAIL PROTECTED]
Subject: [Declude.JunkMail] URL's in Body as IP4r type..

Hi;
 
I am just brainstorming.. Pro.. con?
 
We know one thing about spam.. someone is trying to sell something.. so in every spam there has to be a way for the spammer to be contacted through:
 
1:  Web site visit (URL or IP),
2:  email
3:  Phone number
 
In general I have seen no more than one or two of the above unique entries in a single spam.
 
In the absence of a point of contact there is no point in the broadcasted mass mail.
 
Of course the above is the obvious ..
 
While all IP4r tests concentrate on finding the point of origin of the email what if we try to block the email content?
 
So what if..
 
1: An added program be written as an add-on to Declude that extracts the unique emails, URL's, IP's or phone numbers from the body of the email.
 
2:  Sends these numbers as query to a server much like the IP4r tests for response.
 
Would this not work?
 
I know with our filter tests we have pretty much blocked all spam.  In the last month I have had one spam that came through and the rest are all blocked.  So if we are to expand on this the logical step, in my opinion, is to have a centralized check point for all the entries we have.
 
We can brainstorm about this and bring out bad, good, what if's, .. may be collectively we can solve this problem.
 
Bad idea!?
 
Regards,
Kami
 

Reply via email to