There's a problem with the updates_spamassassin_org.cf file. It contains:
include updates_spamassassin_org/MIRRORED.BY
include updates_spamassassin_org/languages
include updates_spamassassin_org/triplets.txt
include updates_spamassassin_org/user_prefs.template
The first file does not exist.
Should be fixed now. I forgot to check whether str2time was able to
parse the date that it was given. Can you check one of the messages
that was generating the warning to verify?
Cheers,
Henry
Theo Van Dinter wrote:
I got 500K of:
Use of uninitialized value in gmtime at
I'd expect that the 700k message corpus will be more prone to errors
than the 2M message corpus. It still might be good enough.
I'm not convinced that rescoring (as opposed to putting in new rules)
will do much for 3.0.5's accuracy. If people really want to go to the
trouble of running the
Hi Alexander,
Does your implementation respect the additional constraints required by
SpamAssassin? The constraints are as follows:
1. Only nice rules may have scores less than 0.
2. No rule may have a score above 5.
Constraint 1 is required because it must be impossible for a spammer to
add
Most of this stuff is legacy code from the craig-evolve.c days. I
didn't modify logs-to-c's output function. If it ain't broke, don't
fix it.
num_mutable is the number of mutable tests (instead of immutable tests).
Thanks for your attention to detail.
Henry
Justin Mason wrote:
-BEGIN
As far as I know, I am only waiting on one person's mass-check results.
Unless you speak up before he uploads them, I'm going to start the
score generation without you! ;)
Henry
signature.asc
Description: OpenPGP digital signature
Mass check submissions are closed. I won't be picking up any more.
Thanks everyone!
Henry Stern wrote:
As far as I know, I am only waiting on one person's mass-check results.
Unless you speak up before he uploads them, I'm going to start the
score generation without you! ;)
Henry
I'm not sure what I'll *need* to make good scores. Last time around, the
results were pants (--reuse was broken), so I don't have much to go on as
far as numbers are concerned.
Cheers,
Henry
On Wed, 20 Jul 2005, Justin Mason wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Theo Van
+1
On Tue, 19 Jul 2005, Daniel Quinlan wrote:
I propose we create a Rules Project as a part of Apache SpamAssassin.
Initially, the project will consist of the existing (empty) rules
directory in Subversion (the CVS replacement used by the ASF).
Each committer will have their own sandbox to
+1
Daniel Quinlan wrote:
I propose we start over with the rules unzeroed (not a massively
significant change, but I think it is helpful) and Michael's reuse patch
so messages without X-Spam-Status will have non-realtime results (a more
significant change). Please vote on this and we'll repeat
(09:29:43) Henry: can we take masses/* out of R-T-C mode, since this is
the rare time that it gets any attention?
(09:29:58) Daniel Quinlan: yes
(09:30:04) Daniel Quinlan: for MINOR changes ;-)
(09:30:52) Daniel Quinlan: post to dev@ about it and note my agreement
(09:31:01) Daniel Quinlan: just
I'm making the trek across the pond!
Henry
Theo Van Dinter wrote:
On Wed, Jun 22, 2005 at 04:28:41AM -0400, Duncan Findlay wrote:
It'd be nice to get as many developers as possible in the same room -
in fact it'll probably be a record. I think there'll be 5 of us in the
area during CEAS?
Sidney Markowitz wrote:
As part of a term project I'm about to finish I've been looking at some
aspects of the perceptron scoring we do and have some ideas for alternatives
I would like to try.
Can someone tell me how many email samples and how many rules typically go
into the perceptron run
I was thinking of you when I wrote that. The open research question is:
Can we find all the matches for n regexes in o(n^2+m)? Can we tell
which of the component regexes have matched?
Henry
Scott A Crosby wrote:
On Tue, 17 May 2005 14:01:09 +0100, Henry Stern [EMAIL PROTECTED] writes:
3
I've only just noticed this thread now. Sorry for the delay in response.
--
Re: Boosting.
I'm really not a fan of ensemble learning algorithms such as boosting
and bagging. IMO, it is a hack used to prop up unstable learning
algorithms such as ID3 and C5.0.
What would be far more useful is an
Hello all,
Sorry for the delay here. The list was created a few days ago, but I am
in the middle of an overseas move.
The list is [EMAIL PROTECTED] To subscribe, send e-mail
to [EMAIL PROTECTED]
I won't be able to participate much (if at all) for the time being but
for an initial topic of
Hello everyone,
I hope that you have all had safe and enjoyable holidays. My apologies
for starting this discussion so close to Christmas. In all honesty, I
had forgotten that Christmas was coming.
To start things off, I propose that we create a sub-project of
SpamAssassin consisting of mailing
I'm going to get back to work on this on January 2nd once my apartment
is cleaned up from the NYE party. ;)
Henry
Dougal Campbell wrote:
Harry wrote:
As for starting a project, I think it would be good idea. I think there
may be a cat herding issue though.
So, any news on the cat-herding front,
I'd have to take this into account when optimising the scores. Then,
since the scores would be optimised for multiple hits, spammers would
only have to reduce the number of hits to evade SpamAssassin.
It's the same reason why we use a Bernoulli event model in Bayes.
Henry
Marc Perkel wrote:
This
There is no permanent solution to email spam (not yet anyway) and I
doubt there will be one for weblogs, its an arms race ;) SA3 could go a
Weblog spam is completely different from e-mail spam. The objective of
the e-mail spammer is for you to read their message and respond quickly.
The opposite
as
well as to other weblog software developers that I have missed.
I look forward to collaborating with you in the future.
Best regards,
Henry Stern
Committer, SpamAssassin
.
Rather than porting SpamAssassin to weblogs, I'm suggesting that we take
what we know from the spam e-mail domain and help to come up with a
permanent solution to weblog spam.
Henry
Michael Parker wrote:
On Wed, Dec 22, 2004 at 03:00:16PM -0400, Henry Stern wrote:
I'm very interested to hear any
Sidney Markowitz wrote:
Nick Leverton said that papers he has seen found that learn on error
always works better than learn everything. But I recall one that
looked more carefully at longer term results and found that learn on
error degrades over time. They found it best to retrain on fresh data
Hi Vaishnavi,
I wrote a parser for the 12000 message SpamAssassin public corpus
(http://spamassassin.apache.org/publiccorpus) based on SpamAssassin's
Bayes code. If you would like to use it, you can download both the
parser and a pre-tokenized corpus from
- (g) -- possibly -- do a quick perceptron run to evaluate if the rule
overlaps with other rules too much.
The perceptron won't tell us much about overlap, but I'm sure that I can
come up with something to help out in that department... after I finish
my thesis.
Henry
P.S. Writing a thesis is
Hi Jeff,
You might want to reconsider your use of the entire DMOZ directory.
There may be some subtrees that you can ignore. Of the 1338 DMOZ false
positives, how many of them are from the same sections on DMOZ?
Henry
Jeff Chan wrote:
Daniel Quinlan, one of the principal SpamAssassin architects
To the best of my knowledge, Apple Mail uses latent semantic analysis
for clustering. I wrote a Slashdot comment about this a while back:
http://slashdot.org/comments.pl?sid=108111cid=9194254
Henry
Sidney Markowitz wrote:
I stumbled across this article
27 matches
Mail list logo