Hello Steve,
I am quite old user of dspam (I started with 3.6.x) and now I am
cosidering trying 3.9.0 Alpha.
I personally think that dspam should not do any special checking for
configuration sanity, but I docs should be updated with
sample configs for common scenarios and some care should be taken to
help user to choose best tokenizer/algo/storage/trainmode for
their scenario.
I have chosen years ago train on error, algorithm graham burton and
mysql storage.
Mysql was a choice for me because I wanted daemon mode and it is thread
safe.
I have chosen train on error because I wanted to keep token data small.
Some info on this is on dspam wiki, but it is generally hard to find
which combination to start with.
I would like to install new dspam to one of our filter servers, which
handles cca 4000 hams and 3000 spams daily.
I'd probably use mysql again, becouse I know it quite well. I'd try teft
training mode, I do not expect big load on this server.
I am not sure about tokenizer and algo. My preference is to get very low
number of false positives.
I think I should try to start with osb or chain tokenizer, graham+burton
and markov.
--
S pozdravem
Josef Liška
CHL | system care
Telefon: +420.272048055
Fax: +420.272048064
Mobil: +420.776026526 denně 9:00 - 17:30
Jabber: [email protected]
https://www.chl.cz/
Steve napsal(a):
Hallo Tom,
Steve schreef:
I see a bunch of issues:
1) You should NOT use SBPH with MySQL driver. The only driver able to
handle SBPH without issues is the hash driver. Please use something else then
SBPH. I would suggest OSB.
There seem to be multiple issues with some of these configuration
options, notably combintaaions between Algorithm, Tokenizer and the
various storage drivers. Maybe dspam could check for these and exit with
a "configuration error" when incompatible choices are made?
They are not incompatible by definition. It is more a technical limitation then a real hard incompatibility.
I understand that a "configuration error" would be nice and dandy but where to start and where to
stop? I mean what conditions do trap a "configuration error" and which one are not going to trap a
"configuration error"? The possibilities for making errors in the configuration are endless and a
procedure to check them all would be huge. Hard configuration errors are anyway trapped and DSPAM will refuse
to run/work/process/whatever if you set some options. But the above mentioned condition is more or less a
technical limitation and as time passes it could be easily happen that a new release of the RDBMS could cope
with the load of SBPH.
I personally would not extend the current code to check each and every (stupid)
combination of options. Some things are better left to be done outside by
humans then internally by code. But that's my personal opinion. I don't own
DSPAM. It's a tool for us all. If the majority of the user base would like to
have that extended error checking then we could look to implement it in the
next release after 3.9.0. Depending on how the checking is done it could be
however that the checking is going to eat some processing time when using
DSPAM. For me that would be a big no-go. I am all for trapping hard
configuration errors but leaving the soft configuration errors to be resolved
by the user/admin using DSPAM.
--
Regards,
Tom
Steve
begin:vcard
fn;quoted-printable:Josef Li=C5=A1ka
n;quoted-printable:Li=C5=A1ka;Josef
org;quoted-printable:CHL po=C4=8D=C3=ADta=C4=8De, s.r.o.
adr;quoted-printable:;;Karla Majera 93;V=C5=A1enory;;252 31;Czech Republic
email;internet:[email protected]
title:root
tel;work:+420 272 048 055
tel;fax:+420 272 048 064
tel;cell:+420 776 026526
note:jabber: [email protected]
x-mozilla-html:FALSE
url:http://www.chl.cz
version:2.1
end:vcard
------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user