Re: [Dspam-user] A utility for parsing IMAP(UWash)-type folders to feed to dspam

Stevan Bajić Mon, 21 Dec 2009 11:35:49 -0800

On Mon, 21 Dec 2009 09:56:46 -0800
Christopher Jay Manders <[email protected]> wrote:


> Hi all,
> 
Hallo Christopher,


> So, I have written this small C program/utility that I have been using
> against my user's 'Junk'  and 'Whitelist' folders respectively.
> 
so you are the author of dspam-trainer that I have recently seen on SF?


> The Dspam utilities that I have seen so far are based on procmailing
> your emails as separate files and parsing them individually.
>
That is now over one year not true any more. If you look at the CHANGELOG then 
you should spot this little entry:
---------------------------
[20080503:1400] mjohnson: Dspam train with MBOX files

Submitted by Vadim Zeitlin. Allows dspam_train to work with both maildir-like
directories and also MBOX folders.
---------------------------


> I use the
> UWash imap server which stores emails in files, one for each IMAP
> folder (e.g. Junk, Inbox, Trash, etc.) and so far I have not seen
> anything that can be used to bridge into Dspam.
> 
dspam_train after 3th May 2008 can do that.


> So, my 1st question is: have I re-invented the wheel?
>
I only quickly have looked at dspam-trainer. I spotted it on the 17th December 
this year after doing something on the SF home page of DSPAM and SF showed me 
on the right part of the page a link to dspam-trainer. From what it claims to 
do I would say that you have re-invented the wheel to a certain degree. To a 
certain degree only because dspam_train is made to bulk train DSPAM while your 
application is using the old signatures made when DSPAM processed the message 
for the first time and then you reclassify them. This is something that 
dspam_train does not do. But it would be easy to add that to the current 
implementation. I for example have extended dspam_train to do many things that 
the original script is not doing:
---------------------------
theia spam-stuff # ./dspam_train_tone_v5 --help
ERROR: spam corpus must be path to maildir directory or MBOX file.

Usage: ./dspam_train_tone_v5
  [[username]|[--user username]] User name to use for training
  [--client]                     To run in client mode
  [--random]                     Randomly process corpi
  [--refute]                     To unlearn errors from opposite class
  [--subject]                    To show subject from error/unlearn/TONE
  [--max-retrain max_retrain]    Maximum relearns per error/TONE
  [--spam-threshold threshold]   TONE Spam threshold
  [--ham-threshold threshold]    TONE Ham threshold
  [--overleap count]             Overleap certain count of messages
  [--stop-after count]           Stop after processed certain count of messages
  [[-i index]|[spam_dir] [nonspam_dir]]

theia spam-stuff #
---------------------------

For a skilled programmer as you are, adding that reclassification 
functionallity to the current dspam_train should be no issue.


> Is there a
> way/program already available that can be used to crontab the input
> provided by the other users of the system besides the one I have
> written? If so, any information would be a great help.
> 
dspam_train is your friend :)


> The 2nd question is in regards to my logic for the program I have
> written (assuming there is no other alternative as I did).
> The program essentially:
> opens the file arg provided.
> for every Dspam signature seen, feed to dspam
> close the filehandle
> Anyone see any drawbacks to this system?
> 
No. Other tools prividing the same function set as you (for example the Dovecot 
Anti-Spam tool) are doing +/- the same thing as you do. The only thing that I 
remember you doing differently is that you relay on the signature to be in the 
body while the other implementations can cope with all possible locations of 
the signature (header, body or attachment).

Is there any reason why you have written that in C and not used a scripting 
language? I ask because often writing such stuff is easier to maintain and 
extend if written in a scripting language. And you are not using libdspam but 
just doing a wrapper to call then the DSPAM binary.


> I spent a bunch of time debugging the critter to make it work nicely.
> It compiles on great on Linux, FreeBSD, Solaris and AIX
>
You have access to all those platforms? If so then might I misuse you in the 
future to test-compile DSPAM on those platforms and send us feedback about how 
well/bad it went?


> (others, too,
> I'd wager, since there are no non-standard functions in it).
> I am happy to release the source under GNU.
> Is this something someone might also be interested in? If so, just let me 
> know.
> 
> Other thoughts?
> 
> TIA!
> 
> Christopher
> 
-- 
Kind Regards from Switzerland,

Stevan Bajić

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Re: [Dspam-user] A utility for parsing IMAP(UWash)-type folders to feed to dspam

Reply via email to