Hello everyone,

I'm a new ASSP user - I set up one production proxy and
preparing to do another, and in the meantime I found that
sorting mail into spam and ham is rather tedious, for not
everyone can just copy their own mailbox as initial learning
corpus for ASSP - some of us set it up mainly for other
people, and thus their mail is more correct.

Plus I gather from time to time one has to retrain ASSP for
slightly different ham and spam characteristics, therefore
one has to do manual sorting all over of at least new spam
again - is that correct?

Anyway, this little Bash script can make manual mail sorting
for ASSP learning corpus less effort-intensive.

Since sorting moves mail to folders sorted/spam,
sorted/notspam, etc, one can stop sorting at any time by
pressing Ctrl-C and not lose any sorting results.

If one runs

./spamsort --rate-collected-spam

before starting manual sorting, it also displays spam/ham
probabilities and bayesian confidence during sorting.

The caveat, however, is that spamsort learns these things
from ASSP by means of using curl to connect to
administrative connection of ASSP. It works, but it's
horribly slow - it takes over 1 second to rate one mail!

This is because script opens up a new HTTP connection, rates
one mail and closes the connection, and then it has to do it
again to rate another mail.

Anybody who has a clue how to make curl use persistent HTTP
connection to learn mail probabilities faster, please raise
your hand?

I obviously welcome comments, improvements and pointing bugs
out.


Link to script:

http://www.wbp.krakow.pl/mk/spamsort


Usage:


spamsort v0.2 - Bash script for sorting spam collected by
Anti-Spam SMTP Proxy.

 Commandline options:

    --spam
    --ham

     Start manual sorting of spam or ham, respectively.


     --rate-collected-spam

     Calculate stats of spam in 'spam' folder of ASSP.
     WARNING: IT'S *VERY* SLOW AT THE MOMENT SINCE IT USES SEPARATE CURL 
CONNECTION
     TO ASSP FOR RATING EACH MAIL.


     --quiet

     Don't print stats for each mail during rating.


     --delete-high-rated-spam

     Moves highly spam-positive mail to sorted/spam folder.
     WARNING: YOU HAVE TO DO INITIAL ASSP LEARNING IN ORDER TO AVOID DELETING
     MANY FALSE POSITIVES IN SPAM FOLDER. USE THIS OPTION AT YOUR OWN RISK!!!

     

Regards,
Marcin Krol


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Assp-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-user

Reply via email to