First of all spam is anything
But what you're
asking for is the difference between our human brain and stupid computers (Pete,
your comment please ;-)
Generaly I simply try to keep our customers mailbox as
clean as possible from all this automatic generated stuff. Human brains are so
intelligent but computers are much faster to send out billions of messages in a
very short time. Our life is short enough to not spend it on handling all this
stuff manualy.
For sure: There is also legit automatic stuff. In this case
the challenge is not to identify spam but to identify and let pass
computer-generated ham.
One good qualification for "bad content" is the weighting
system and combo tests. If many different tests fail on the same message we all
know it's a good indicator of spam. If there is someone sending out legit
messages failing many different tests then he definitively does something wrong
and has to rethink what he does in order to do it
successfull.
Consider the numerous "spam-filters" out there, blocking
messages based on single indicators of spam. (for example if failing on one
single IP-blacklist) or this pure text filter solutions catching only arround
60% of spam.
As long as there are such services I and my customers can
live with the knowledge that nobody is 100% perfect.
At the moment I'm working on a new system that will clasify
messages in the follwing 4 categories:
The notifications are a little bit
problematic:
-
As there are many customers using our server as gateway we doesn't know
if the recipients adress is real existing. So at the moment I try to look if
this recipient has received legit messages (<50% of the hold weight) in the
previous - let's say - two weeks. This should prevent us to send out a big
number of unneccessary messages (for example after dictionary attacks to
gateway domains)
-
I want to send out as few notifications as possible. So I plan to
generate them two times each day: the first time at around 9:00am of local
time. The second at around 05:00 pm. With this strategy I hope to notify each
recipient the same day as the false positive was hold on our system, but not
more then two times each day, even if I have enough data to send notifications
each hour. (if not recipients with a big spam volume would receive a
notification each hour) The notifiaction contains only a link (containing a
long random string as access security) to a dynamic website. This website will
show him a list (datetime /sender / subject) of all messages between 120 and
170% of our current hold weight. I believe we can't send out notifactions
containing recipient addresses and subject lines in the body, as spam filters
like them included in MS Outlook will block them another time. With the
dynamic website I can track the visits and so prevent any further notification
until the customer has visited the website. This should reduce our
notifications another time.
All this work with the notifiactions has the following
benefits:
-
not we but our customers can decide what's ham and whats spam (at least
in the mentioned grey zone)
-
customers can see our service
-
we have a copy of each "false positive" and can concentrate our work on
preventing this in the future beside the work of keeping the 120-170 zone
as clean as possible from messages in order to reduce the review work of our
customers (for example with my AVFILTER-COMBO
test)
At the moment I'm working on this and so many ideas are
still theory, but I'm happy for any feedback.
Markus
Markus,
I have found that my users miss about 99% of the
false positives using a system where I set up review accounts in Web-mail for
each domain and only capture less than 2% of their blocked volume for them to
review. Reprocessing and reporting the message is done with a single
click using a link that I added to the interface for this purpose. I
know that they miss this much because we also do review for the hold range
across our entire user base, however we don't guarantee in any way that we
will find every false positive or review this with specific regularity.
Obviously as volume increases, so does the work required for us to do this,
but it is quite easy for all but a couple of our domains to be reviewed
because the number of held messages are generally below 20 a day, and only 7
days are kept.
I too am looking to move to a 'push' format, figuring
that if you deliver a message daily to each domain's administrator showing
this small sampling where false positives are almost exclusively held, I will
dramatically increase the amount of user feedback and more importantly, lessen
the dependence on us. I have only had one customer that was ever upset
about false positives, and this customer dropped us. The issue there was
that the domain owner's wife was very big on free-deals sites, and their daily
E-mails were often being blocked and they never gave us enough time to clean
up all of the issues. Personally, I don't feel that our service is
appropriate for people that value such things so highly, especially since so
much of it is associated with spam (shared or brokered lists).
So
having this Web-mail review for each domain has in fact provided us with
feedback from those few that feel that this is important to them. I have
found that the people driven enough to do the review do in fact often report
false positives for sources like eWeek and Orbitz, even things as pointless as
surveys. I do very much appreciate the feedback, and I have killed
entries from my own blacklist repeatedly as a result of these reports after
finding that people did want to make their own choices with the tertiary
stuff. Since they are also generally tech types, they favor tech
content, probably due to familiarity as well as favorites. Those that
don't regularly do review however are much less likely to report advertising
or low-value newsletters/subscriptions as being false positives. These
types are also strangely enough much more likely to report a phishing attempt
as a false positive, and that has happened 3 times that I can recall (I'm
improving my phishing filters to get this stuff deleted more often instead of
just held). So the gist of this is that I get the feeling that just like
us, the administrators of these individual domains have their own
sub-conscious rules that they use.
This is all a bit secondary to my
original inquiry however. What I was really interested in was what rules
people like yourself use in determining whether something is ham or
spam.
Thanks,
Matt
Markus Gufler wrote:
I'm close to finish a reporting tool that will send out
a daily notification to the local recipient if new messages was hold on the
mailserver with a final weight slightly above the hold weight (up to now we
review this messages regulary and can find an average of one false positive
each day by around 15k delivered messages)
The notification contains only a link to a webpage
where the user can see his hold messages and klick on it to requeue
them.
I'm curios what my customers will consider "not
spam" :-)
Markus
This was the subject of a recent off-list
discussion between myself and Pete where there was a perception that my
definition of spam was too conservative or rather my definition of ham was
too liberal. While I readily admit that in practice, I do
personally wish to block many fewer things that I consider to be
legitimate first-party advertising than most do, I don't necessarily get
the impression that the definitions that I use are all that much off the
mark. I have also found that the folks at BondedSender think that I
am some sort of anti-advertising zealot for reporting what is near
universally what we would consider to be spam, so it does go both ways
:) So I wanted to throw this topic out for some feedback and other
presentations of one's own definitions and maybe learn something in the
process.
First off, I naturally follow the basic definition of spam
that is widely promoted where spam is both unsolicited and
bulk. What causes such wide derivation from this common definition
however is the sub-definition of what constitutes unsolicited, and the
gray area that exists beyond this definition due to abuse.
The
definition that I use to qualify advertising or newsletter related ham is
as follows:
This definition starts with me treating things as ham if it
comes from a first-party relationship with the sender, however there are
some exceptions as follows:
- Evidence of the first-party having harvested significant numbers
of recipients in the list, i.e.: Reunion(dot)com.
- Refusal to honor opt-outs.
- Having no opt-out mechanism for repeated E-mails that are
advertising related.
- Third-party ads being sent by first-party source when they are not
the primary reason for a membership, example: Sportsline's partner
specials
- Very widespread abuse of a particular direct-marketing provider
where most customers of a service are spamming, example: Uptilt.
- Selling subscriber lists from one otherwise legitimate site to
spammers or brokering lists for spamming, example: many joke sites.
It's my belief that many would consider this
definition to be agreeable (please speak up if you don't), however I am
near certain that in practice there is a good amount of derivation from
this even among those that would at least initially agree with the
above.
The issue of applying this in practice to me means that I
try not to apply my own emotions or judgments of value to a particular
sender. This means that I treat advertising from J.Crew just the
same on my system as E-mail from Orbitz, though I personally find Orbitz
and most other travel sites to be annoying with their frequency and low in
value to the recipient. The trick here is that I have found no
evidence of harvesting from either source, and they both practice
default-opt-in to their newsletters from their customer-base, and they
both seemingly honor opt-outs, so the only difference that I perceive is
the subject matter of the E-mails. I have found that many
administrators will blacklist Orbitz and even report them to SpamCop,
while this is less commonly the case with J.Crew. So the determining
factor that is often used regardless of a stated or intended definition
appears to be a value judgment placed on the content of the E-mails,
either consciously or unconsciously. Would anyone agree or disagree
with this perception?
One last note: personally I find the industry
standard practice of default-opt-in for customer lists to be disturbing,
but if one was to consider that alone as a qualifier of spam, over 99% of
advertising messages that pass my definition above would fail the much
tighter definition of double-opt-in for requesters only. Since this
has become the standard practice in the entire industry, I allow for it
just so long as they follow my rules since I definitely have customers
(including myself) that do wish to receive some of what is sent to me
without initially requesting it, and my customers have the power to
opt-out and report any abuse to me for appropriate action.
Please
add your comments or even your own
definitions.
Thanks,
Matt
--
=====================================================
MailPure custom filters for Declude JunkMail Pro.
http://www.mailpure.com/software/
=====================================================
--
=====================================================
MailPure custom filters for Declude JunkMail Pro.
http://www.mailpure.com/software/
=====================================================
|