On Tue, Nov 11, 2003 at 04:44:20PM -0800, Marc Baber wrote:
>
>I would say that spam, or at least the set of e-mails that one might 
>want to be classified as spam, is *not* the same for everybody, in the 
>case of politically motivated spam filtering.  Because the "corpus" body 
>of collected spam e-mails is used to filter e-mail for all users, one 
>person's spam report can affect e-mail delivered (or not delivered) to a 
>large number of people.
>


>The reason I talk about lost e-mails is because my account was defaulted 
>into a "delete spam" mode when spam filtering was first introduced at 
>EFN and I never saw filtered spam until I specifically contacted EFN to 
>personally to ask that my account be exempted from that default.  I have 
>no experience of receiving flagged spam as the "default" action for 
>EFN's spam filter.  I had to lose an airline reservation e-mail and at 
>least one job-seeking related e-mail before I became suspicious and 
>started asking questions to learn that my account was defaulted to "drop 
>spam silently".  That was very frustrating for me and has made me the 
>spam-filter-unfriendly guy I am today.
>

Hm.  There have been three periods in the history of efn's spam filters:

< 2002 DNSBLs coupled with local sendmail rules
        During this period, any email that was rejected by our servers,
        would be bounced back to the sender, thereby meeting the RFC
        requirement to either deliver or account for every piece of mail.
        This was an increasingly labor-intensive solution, but it did
        not generate very many complaints to the volunteer postmaster.

2002   DNSBLs coupled with SpamAssassin, auto-deleting
        During this period, efn ran SpamAssassin, first on our main
        incoming mailserver, and later on two dedicated hosts.  
        Mail which was flagged as spam by SpamAssassin was automatically
        deleted.  Mail blocked by DNSBLs continued to be bounced.
        Reliability, both of the mailsystem as a whole, and of the spam
        filter in particular, became embarassingly bad.  If i recall
        correctly, it was during this period that your missing mail
        episodes happened.  I apologize for, and continue to be ashamed
        about, our mail performance during this period, but there was
        really nothing more i could have done to fix it than i did, and
        the problem was essentially political.  You are not the only one
        to be wary of SpamAssassin on the basis of such experiences; our
        debacle caused UO to become very wary of any futher experiments
        with SpamAssassin.

2003 > DNSBLs coupled with SpamAssassin, flagging
        In early 2003 we experimented with bouncing back to the
        sender mail which was flagged as spam by SpamAssassin.
        This resolved the RFC-compliance problem, but did little
        to improve the reliability issue.  Since then we have
        been delivering with messages flagged by SpamAssassin
        (DNSBL rejects are still bounced).  If people opt
        to auto-delete flagged spam at delivery time, we do that
        for them on a user-by-user basis with .procmailrc configuration.
        We aim to enhance this mail system further with individual
        user configurability.

>If there is to be a central "corpus" of spam for all users, I'd like to 
>see some accountability and transparency:
>
>1. Who makes the final decision if an e-mail submitted to [EMAIL PROTECTED] 
>or [EMAIL PROTECTED] is included in the "corpus" as such.  What are the 
>relevant policies?  Is it automated or staffed?

At the moment, any email which is submitted as spam, and is recognized
by the postmasters as a sample email (rather than, say, a request for
whitelisting or tech support) is queued for eventual inclusion in
the Bayesian filter.  The sender's report is considered sufficient
evidence that the mail in question is indeed spam or tofu.  At present
none of it is committed to the Bayesian filter, which learns only
on the auto-learn basis of mail that it examines as it goes.

>2. The "corpus" should be in an open web directory that is searchable. 

That is an interesting idea, i'm not sure how we'd implement it (MySQL?).
We are talking about millions of messages here.

> When the SpamAssassin says something is spam, there should be links to 
>the reference e-mails in the corpus that were correlated with the spam, 
>upon request, so a user can review whether the items in the corpus are 
>"objective" spam or "subjective" spam.  The individual must have a way 
>of reviewing the decisions or processes that contribute to the corpus.
>

That looks to be Very, Very Hard to do.  I don't know that SpamAssassin
has any sort of support for audit trail in its Bayesian mechanism,
and i would expect including it to signifigantly increase both the 
CPU cycles and the disk space needed to manage a mailstream as 
large as efn's.  It might be easier to do per-user Bayesian filters,
or perhaps to have a Spam Committee which must approve inclusion
of messages into the spam or tofu pools.

I'd also like to touch on the theme that i've heard, e.g., attributed
to John Gilmore, that spam filtration is an intolerable censorship
of free speech.  One would think that the experience of the last
year would have rendered this a dead issue, but i am told that he
was holding forth on this theme at LISA just two weeks ago.

EFN' SpamAssassin filter is currently marking about 40% of its
traffic as spam.  We know that we are also bouncing thousands
of messages every day which SpamAssassin never sees, and that unflagged
spams are making it past the filter.  Spam is approaching 50% of
mail traffic, and is rendering email an unusable medium for 
meaningful discourse.  If a vigorous effort is not made to 
thwart spam, we'll wind up with absurdities like the 'free speech'
plaza at Saturday Market where speech is perpetually drowned
out by drumming.  Those who know will use other media to
communicate (e.g, ASL) and those who do not will be left silent.
I can't see that as any sort of victory for freedom.

-- 
"That time in Seattle... was a nightmare.  I came out of it dead broke,
without a house, without anything except a girlfriend and a knowledge of
UNIX."  "Well, that's something," Avi says.  "Normally those two are
mutually exclusive."                    --Neal Stephenson, "Cryptonomicon"
_______________________________________________
EuG-LUG mailing list
[EMAIL PROTECTED]
http://mailman.efn.org/cgi-bin/listinfo/eug-lug

Reply via email to