On 30.03.2012 15:44, René Neumann wrote:
Am 30.03.2012 15:22, schrieb Stevan Bajić:
I upgraded my dspam installation from 3.8.0 to 3.10.1 several weeks ago.
And I had to notice, that the detection quality suddenly dropped a lot:
Each day I have to remove several mails from Junk because they are seen
as spam -- this didn't happen in the old installation and it also does
not change. Was it a mistake to use the old database? Would it be wise
to drop the old data and retrain?
in your case: yes!
I say that because I see you are using CHAIN and you are using TEFT. If
you ask me then I would start from a empty database and would use OSB
and TOE.
But if I remember correctly, TOE should only be used if the database is
quite mature. Doesn't this hold anymore?
This does not hold any more.
And I forgot to mention, for most of the accounts this is actually set
to 'TUM' (that I -- from the description -- prefer).
TUM is okay. Anything other then TEFT.
Btw: For re-training: Is there some nice 'junk database' one could use
(for non-junk I can just use the current messages)? I know that when I
first installed DSPAM it took me quite a while to find such a junk
database -- but I forgot where it came from.
http://spamassassin.apache.org/publiccorpus/
http://plg.uwaterloo.ca/~gvcormac/treccorpus/
http://plg.uwaterloo.ca/~gvcormac/treccorpus06/
http://plg.uwaterloo.ca/~gvcormac/treccorpus07/
Let me know if you need more.
If you want my advice: Don't use any pre training. It is almost useless.
Switch to osb tokenizer and let the engine do the rest. You will see
that you will very quickly (waaaay quicker than before) have already a
score above 95%.
Or if you really want to do training then do it in conjunction with a
merged global group and train that. But I would not train individual users.
I know, I know. It sounds strange. But I have been there. I have trained
for weeks (in the old days when the systems where not that fast) and the
result of this insane training is: do not pre-train. It will eat a lot
of time and bring almost no benefit (often with something modern like
osb in conjunction with TOE/TUM it will be even a disadvantage to
pre-train individual users).
Also I am a bit puzzled about the new configuration: Several options now
appear twice in the conffile: One time as a normal option and one time
as a 'Preference' parameter. It is not clear to me what takes precedence
or what happens if one of them is not set. Perhaps this influences the
problem above, as I might have conflicting options set this way. (Why
are these 'Preference' parameters there anyway?)
The entries without Preference are the global valid entries. Preferences
are values that each user can have and can change (if you allow him/her
to change them).
So what is the actual effect of setting:
TrainingMode teft
Preference "trainingMode=TUM"
(and assuming no override is done by the user)
User has more weight than the other.
And if I understand this correctly, I can drop any Preference thingy
that I don't want to be overridden by a user anyway?
Not really. There are just a bunch of values that are available in both
places. The one that are NOT preferences are used by the DSPAM
agent/daemon while the other with the preference are used in the DSPAM
client. Dropping them is not what you want (I guess).
- René
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user
--
Kind Regards from Switzerland,
Stevan Bajić
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user