El 14/08/2012 05:05 p.m., Stevan Bajić
escribió:
On 14.08.2012 21:05, Cristian wrote:
El 14/08/2012 03:42 p.m., Stevan
Bajić escribió:
On 14.08.2012 19:32, Cristian
wrote:
[....]
[....]
Y run daily:
dspam_merge user1 user2 user3 -o globaluser
The globaluser is the merged group. With this command the
data of the users is put into the globaluser really?
It should. That's at least what the command dspam_merge is
supposed to do.
So when delete the data of the users the dspam
read the data from the globalgroup?
Yes. The data is (should be) in the globalgroup. And it is
read from there. But look at this (as an example).
globalgroup has 1'000'000 tokens and 100'000 processed
messages
user data has 1'000 tokens and 100 processed messages
Now assume a inbound message has 100 tokens and those 100
tokens fully get a hit in globalgroup then you have 100 tokens
out of 1 Million. Now assume you get 100 tokens fully hit the
user data then you have 100 tokens out of 1'000 tokens. I
think you don't need much mathematical knowledge to understand
that the user hit has more weight than the same hit on the
globalgroup.
Understand, but so dspam only works compare the tokens that get
hits with the total of tokens? Is the only way?
What I wrote is a simplified view. If you want to understand how
things in DSPAM work then you should read about Bayes theorem and
probability. For example:
http://en.wikipedia.org/wiki/Bayes%27_theorem
http://en.wikipedia.org/wiki/Bayesian_probability
This is ok?
Yes. It is okay.
This is bad?
No. It is not bad. IMHO it is unusual to do that daily merge
but why not?
How train the global merged group?
With dspam_train maybe?
But dspam_train can train from the learning of another users?
Depends how you present or make available the data to dspam_train.
etc...
Remember that I have the messages storage with mdbox, and I
dspam_train can´t read directly from the localdisk.
dspam_train CAN read directly from the local disc. Unfortunately
dspam_train does know how to handle multi-dbox format. However...
DSPAM is open source and you can easily extend dspam_train to
handle mdbox format.
This not is really a user,
This does not matter for DSPAM.
and the accouts run in mdbox format. The idea
is train the globalgroup from the trained users.
Then maybe using a managed group would be better?
The issue is I need general rules that help a new user to have a
good antispam, but if the user has false positives, he can fix
this.
How is retraining done on your setup? DSPAM Web-UI? Other ways?
What would that be?
I use dovecot plugin to train dspam.
I will need to manage aprox 10.000 users, so need any solution
that don´t have a big database for every user.
A global merged group is a good way to reduce the overall database
size. Are those 10K users having +/- the same type of mail? Same
language? Same sort of data?
OSB is btw another way to reduce size. Running the cleaning job
daily or using the dspam_maintenance script is another way keeping
the data consumption low.
Yes, the users have the same language, type of mail, etc. Currently
I use OSB. But don´t imagine a database with 10.000 users without a
global user. No have experience how many data
can store mysql into the database.
But I can´t understand if the steps that train DSPAM are good. If I
use dspam_merge and next clean the data of the users, why this
doesn´t work fine. For example, I clean all the database, and in
some
hours the DSPAM filter 30% of the spam of the users. Next merge the
data and clear the users, and this works worst. An user with 200.000
tokens works fine and a globaluser with a 200.000 tokens don´t. And
the
tokens are of users with the same type of spam.
Is it possible can have any bad into the global setup?
Cristian.
--
Kind Regards from Switzerland,
Stevan Bajić
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user
|
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user