Your basic theory is sound, however it is not quite that cut-and-dry in
practice. The majority of the rules we code into Message Sniffer are based on
the premise of "attacking the redirection"... that is, filter on where "they"
want us to go. This makes these rules very effective if they are coded
properly.
However, the spammers go out of their way to thwart this kind of
filtering by obfuscating domains, phone numbers, and anything else they can
to hide the information. (such as hiding phone numbers in images) This
places a limit on effectiveness, but doesn't eliminate it since the costs of
generating successful obfuscations are very high...
Another limiting factor for spammers' obfuscation techniques is that the
techniques themselves are often easier to filter than the individual targets -
For example, the Russian spammers this week shot themselves in the
foot when they started to encode their phone numbers in a vertical column on the
side of their email messages. The intent was to obfuscate their phone numbers
with a lot of embeded HTML code. Whereas previously we had to capture each new
Russian spam by tagging the unique phone number of each spam, now we can capture
huge swaths of Russian spam by coding for the obfuscation technique (and it's
variants).
--- Regarding the proposal of sending domains, links, and phone numbers
to a central place to coordinate them... We have quite a bit of experience with
this. I can tell you that a significant amount of manual effort will be required
to "edit" any list of candidates, and some significant effort will also be
required to extract candidates in the face of obfuscation... We have a good deal
of automation helping us with this effort - that is how we can afford to do it
for the low price we charge... I doubt it could be done for free, but it's worth
a try since the results can be significant even if they are not
complete.
The other thing I can tell you about this is that you will _NEED_ to have
a contextual reference for the candidates that you send to the central
coordinator. It is nearly impossible to determine which candidates should be
"filtered" without actually studying the email that contained them (even
if you have some very powerful AI and a strong corpus to test
with).
A case in point... spammers have recently begun to heavily lace spam with
visible and invisible references to legitimate URLs and email addresses so that
automated filtering systems will have trouble... I believe this is partially a
response to Bayesian (Grahamian?) techniques under the theory that a legitimate
email address, url, and domain would be statistically weighted to the ham corpus
on a wide group...
I hope this info is helpful,
_M
Pete McNeil (Madscientist)
President, MicroNeil Research Corporation
Chief SortMonster (www.sortmonster.com)
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kami Razvan
Sent: Friday, July 11, 2003 9:05 AM
To: [EMAIL PROTECTED]
Subject: [Declude.JunkMail] URL's in Body as IP4r type..Hi;I am just brainstorming.. Pro.. con?We know one thing about spam.. someone is trying to sell something.. so in every spam there has to be a way for the spammer to be contacted through:1: Web site visit (URL or IP),2: email3: Phone numberIn general I have seen no more than one or two of the above unique entries in a single spam.In the absence of a point of contact there is no point in the broadcasted mass mail.Of course the above is the obvious ..While all IP4r tests concentrate on finding the point of origin of the email what if we try to block the email content?So what if..1: An added program be written as an add-on to Declude that extracts the unique emails, URL's, IP's or phone numbers from the body of the email.2: Sends these numbers as query to a server much like the IP4r tests for response.Would this not work?I know with our filter tests we have pretty much blocked all spam. In the last month I have had one spam that came through and the rest are all blocked. So if we are to expand on this the logical step, in my opinion, is to have a centralized check point for all the entries we have.We can brainstorm about this and bring out bad, good, what if's, .. may be collectively we can solve this problem.Bad idea!?Regards,Kami
