Hi,
four people have checked the spam web form submissions concerning
debian-project. More background can be found at [1]. Thanks to Bas
Wijnen, Paul Wise, and Richard Hecker for reviewing! (Of course, a
special mention to Y Giridhar Appaji Nag who already looked through
debian-devel, but that isn't ripe for action yet.)
Proposal
--------
I propose to remove the 436 messages unanimously classified "spam" from
the web archive.[2]
Note, these will remain available to Devlopers on master.debian.org and
messages will be reincluded if complaints about an erroneous removal are
received by the Listmaster, as discussed at [1] (Policy corner stones).
Some statistics
---------------
Number of messages by range of classification responses (the four
possible responses are explained at[1]):
839 submissions reviewed
436 spam
225 not spam
6 inapp
1 unknown
68 unknown, spam
33 unknown, not spam
18 inapp, spam
9 unknown, inapp
3 not spam, inapp
17 unknown, spam, inapp
8 unknown, not spam, spam
5 not spam, spam
2 spam, not spam, inapp
4 unknown, not spam, inapp
4 unknown, inapp, not spam, spam
Analysis of the debian-project review
-------------------------------------
We should be most concerned about the messages with (detected) errors,
namely those where the answers contain both "spam" and "non-spam", so
below are the message-ids (best used in conjunction with[3]) and some
analysis of the nature of these messages.
While an error estimate would be nice to have, the naive approach is
based on an independence assumption that seems to be very wrong in our case.
I think that improved tools (quicker access to the web pages with the
"next in thread" links or using the web page, in particular), experience
for the corner cases, and triple review (including some experienced
spam-checker) is a good balance of reliability and effort. (I would even
claim that we there is nothing of particular value that received two
spam votes, but we want to be sure and loose as little as possible.)
hecker pabs tviehmann wijnen
--- one spam vote
not spam inapp unknown spam
[EMAIL PROTECTED]
a request to remove stuff from the archive
spam not spam not spam inapp
[EMAIL PROTECTED]
a German user complaining about Debian CDs he bought elsewhere
spam unknown not spam unknown
[EMAIL PROTECTED]
an Italian user question
not spam unknown unknown spam
[EMAIL PROTECTED]
someone complaining about ICQ spam matching some list spam
spam unknown not spam inapp
[EMAIL PROTECTED]
a German user looking for a translation program
spam not spam not spam not spam
[EMAIL PROTECTED]
a complaint about IRC in response to an DWN article
spam unknown not spam inapp
[EMAIL PROTECTED]
a Portuguese user question
spam not spam not spam inapp
[EMAIL PROTECTED]
a German (Swiss) request to be sent a t-shirt to match the swirl
on his motor scooter
spam not spam not spam not spam
[EMAIL PROTECTED]
a French and English user question
spam not spam unknown not spam
[EMAIL PROTECTED]
start of a troll thread
spam not spam unknown not spam
[EMAIL PROTECTED]
further down that troll thread
not spam not spam not spam spam
[EMAIL PROTECTED]
an offer to redesign our web site, possibly serious
spam unknown not spam inapp
[EMAIL PROTECTED]
a Spanish user question
not spam unknown unknown spam
[EMAIL PROTECTED]
a Linux portal announcement at least bordering spam
--- two spam votes
spam unknown not spam spam
[EMAIL PROTECTED]
a Polish user question
spam unknown not spam spam
[EMAIL PROTECTED]
someone looking (in a strange way) for someone with the the same
name as a Debian contributor who has some 256 posts on our
English language lists between 1999/09 and 2001/10
spam spam not spam unknown
[EMAIL PROTECTED]
a Spanish unsolicited software survey not directly related to
Debian
--- three spam votes
spam not spam spam spam
[EMAIL PROTECTED]
a Croatian (one-liner) user question
--- unquestionably spam
not spam spam spam spam
[EMAIL PROTECTED]
link request spam
Kind regards
Thomas
1. http://wiki.debian.org/Teams/ListMaster/ListArchiveSpam
and originally, with followups, on this mailing list
http://lists.debian.org/debian-project/2007/11/msg00012.html
2. In master.d.o:~tviehmann/spam-removals/ you will find
"reports" and "proposed" removals and the python (>=2.4) script
comparing them. The .spam files actually used reside with the
mbox archives on master:/org/lists.debian.org/lists/,
presently only four Listmaster-removed spams.
3. http://lists.debian.org/msgid-search/
use http://lists.debian.org/msgid-search/%s for quick bookmarks
--
Thomas Viehmann, http://thomas.viehmann.net/
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]