On 14.08.2012 21:05, Cristian wrote:
El 14/08/2012 03:42 p.m., Stevan Bajić escribió:
On 14.08.2012 19:32, Cristian wrote:
[....]
[....]
Y run daily:
dspam_merge user1 user2 user3 -o globaluser
The globaluser is the merged group. With this command the data of
the users is put into the globaluser really?
It should. That's at least what the command dspam_merge is supposed
to do.
So when delete the data of the users the dspam read the data from
the globalgroup?
Yes. The data is (should be) in the globalgroup. And it is read from
there. But look at this (as an example).
globalgroup has 1'000'000 tokens and 100'000 processed messages
user data has 1'000 tokens and 100 processed messages
Now assume a inbound message has 100 tokens and those 100 tokens
fully get a hit in globalgroup then you have 100 tokens out of 1
Million. Now assume you get 100 tokens fully hit the user data then
you have 100 tokens out of 1'000 tokens. I think you don't need much
mathematical knowledge to understand that the user hit has more
weight than the same hit on the globalgroup.
Understand, but so dspam only works compare the tokens that get hits
with the total of tokens? Is the only way?
What I wrote is a simplified view. If you want to understand how things
in DSPAM work then you should read about Bayes theorem and probability.
For example:
http://en.wikipedia.org/wiki/Bayes%27_theorem
http://en.wikipedia.org/wiki/Bayesian_probability
This is ok?
Yes. It is okay.
This is bad?
No. It is not bad. IMHO it is unusual to do that daily merge but why not?
How train the global merged group?
With dspam_train maybe?
But dspam_train can train from the learning of another users?
Depends how you present or make available the data to dspam_train. etc...
Remember that I have the messages storage with mdbox, and I
dspam_train can´t read directly from the localdisk.
dspam_train CAN read directly from the local disc. Unfortunately
dspam_train does know how to handle multi-dbox format. However... DSPAM
is open source and you can easily extend dspam_train to handle mdbox format.
This not is really a user,
This does not matter for DSPAM.
and the accouts run in mdbox format. The idea is train the
globalgroup from the trained users.
Then maybe using a managed group would be better?
The issue is I need general rules that help a new user to have a good
antispam, but if the user has false positives, he can fix this.
How is retraining done on your setup? DSPAM Web-UI? Other ways? What
would that be?
I will need to manage aprox 10.000 users, so need any solution
that don´t have a big database for every user.
A global merged group is a good way to reduce the overall database size.
Are those 10K users having +/- the same type of mail? Same language?
Same sort of data?
OSB is btw another way to reduce size. Running the cleaning job daily or
using the dspam_maintenance script is another way keeping the data
consumption low.
Cristian.
--
Kind Regards from Switzerland,
Stevan Bajić
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user