Re: svn commit: r169334 - in /spamassassin/trunk: MANIFEST lib/Mail/SpamAssassin/Conf.pm lib/Mail/SpamAssassin/HTML.pm lib/Mail/SpamAssassin/PerMsgStatus.pm lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm lib/Mail/SpamAssassin/Util.pm rules/20_uri_tests.cf t/uri.t t/uri_html.t
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Theo Van Dinter writes: Sorry to be a killjoy here. On Mon, May 09, 2005 at 03:55:07PM -, [EMAIL PROTECTED] wrote: + my $redirector_patterns = $self-{conf}-{redirector_patterns}; + @uris = Mail::SpamAssassin::Util::uri_list_canonify($redirector_patterns, @uris); Do we consider uri_list_canonify() to be a public function? If so, there needs to be some form of backward compatibility maintained. Since there seems to be no POD for Util.pm at all, one could read that to mean it's all considered private, but we never did finish going through and changing the private function names so it's not clear now. I don't think it's a particularly public API, since it's called already by public APIs. in other words I wouldn't worry about it too much. As for the rest -- let's just check it in and hack away as Sidney suggested. ;) - --j. - my @parsed = $scanner-get_parsed_uri_list(); + my @parsed = $scanner-get_uri_list(); -0.7 URIBL now considers the full list of URIs as having been parsed out of the rendered text, which messes up the priority levels somewhat. A case can be made that the higher priority domains will already be on the list, but it does mean more work for the plugin (more URIs to go through). If we're not changing get_uri_list() (or more likely making a new function) to return a combined uri_detail-esque dataset, then I'd like to see get_parsed_uri_list() left alone (ie: let it do the canonification and get_uri_list() can skip doing it), and just add a call into URIBL to get_uri_list (we don't care about the output) to do the canonification of the HTML bits. Arguably, we could just have a new _canonify_uri_detail() call in PMS and avoid the rest of the get_uri_list() stuff, but ... We know get_uri_list() is being called elsewhere anyway, so it's not a big deal IMO. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFCgCxiMJF5cimLx9ARApgKAJ99n5wSZoEJB+AI9qHEZBmd46KVkgCfaPOg sNXOp1xY1o04TA6c3VQfhmE= =9Jfb -END PGP SIGNATURE-
Re: new tool for website -- versioned configuration reference
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: [EMAIL PROTECTED] (Justin Mason) writes: I'm not entirely sure how useful this is -- on one hand, it's a great way to point to a single configuration setting. OTOH, it's another thing that can break, so I want to make sure 1 people think it's useful first ;) Interesting. Why not have different subdirectories for each version? Easier and less apt to break. Maybe have a version selector from a menu like CPAN. That would be easier to do. Probably a good idea, sth like: /ref/3.0.x/use_auto_whitelist.html displays 3.0.x doco /ref/2.6x/use_auto_whitelist.html displays 2.6x doco /ref/use_auto_whitelist.html displays latest stable doco, 3.0.x and pages have the selector to display other versions, if they exist. Also, does this handle the configuration documentation changing slightly from version to version? They all appear to be the same. It just uses the latest, if the =item exists in 1 version. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBPiV0QTcbUG5Y7woRAtO8AJ4/1Dcsz5t9EqXRrfiLg8vCAW9KGgCgxApE ulifaboTB2wp6IHUZF01S0c= =7/zG -END PGP SIGNATURE-
Re: new tool for website -- versioned configuration reference
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: [EMAIL PROTECTED] (Justin Mason) writes: That would be easier to do. Probably a good idea, sth like: /ref/3.0.x/use_auto_whitelist.html I'd strongly advocate providing versioned links for the current documentation *first* and afterwards we can worry about option links. /ref/3.0.0/Mail::SpamAssassin::Conf.html what, like what's already there? http://spamassassin.apache.org/doc.html - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBPi0vQTcbUG5Y7woRArfPAJ49MqJDiNPDx/n8CE4gWPFBTfojfACgkevf GCHApFlVz1rh1Ze/MN73QgA= =s3s9 -END PGP SIGNATURE-
Re: Why does SA 3.0 require Perl 5.6.1?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Barry Jaspan writes: The two main things I seem to remember are that perl 5.6.1 fixes a bunch of bugs from 5.6.0 If anyone can remember what bugs were fixed that affect SpamAssassin, I'd appreciate it. if I recall correctly, all of them were ExtUtils::MakeMaker-related... some crawling through the bugzilla ;) would probably show up the details. - --j. FYI: MacOS X default perl has several issues. No kidding. :-) Luckily, the setuid() and setgid() problem is not an issue for me. Thanks, Barry Note: This message was dictated using voice recognition software. Please excuse any errors I missed. The main (and rather disturbing) one that I can recall is that it has absolutely no support for setuid() and setgid(), which we came across when working on spamd. -- Randomly Generated Tagline: Well, we're safe for now. Thank goodness we're in a bowling alley. - From the movie Pleasantville -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBQPG6QTcbUG5Y7woRAqxjAKDZJupAdgrRaGb2WOQUc3kzSECrGgCfcBGr VDdaNiiR0eRHOObYDwhWLkg= =4hoR -END PGP SIGNATURE-
Re: BAYES_* scores - non-monotonic?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Alan! Alan Schwartz writes: Lately, running SA 3.0.0 with no rule or score tuning, I have been noticing that my false negatives tend to have BAYES_99 matched. The scores file lists the following scores for Bayes: 50_scores.cf:score BAYES_00 0 0 -1.665 -2.599 50_scores.cf:score BAYES_05 0 0 -0.925 -0.413 50_scores.cf:score BAYES_20 0 0 -0.730 -1.951 50_scores.cf:score BAYES_40 0 0 -0.276 -1.096 50_scores.cf:score BAYES_50 0 0 1.567 0.001 50_scores.cf:score BAYES_60 0 0 3.515 0.372 50_scores.cf:score BAYES_80 0 0 3.608 2.087 50_scores.cf:score BAYES_95 0 0 3.514 2.063 50_scores.cf:score BAYES_99 0 0 4.070 1.886 I realize that these scores come out of the automated algorithm, but they are not sensible on their face, and suggest a potential problem with the Bayesian classifier's operation or the mass check. Note that even without network tests, BAYES_95 BAYES_80, BAYES_60 With network tests, BAYES_05 is BAYES_20, BAYES_40, and BAYES_99 BAYES_95 BAYES_80. It would not be unreasonable to constraint the BAYES_* scores so that they are always monotonic in the predicted probability of spam. This constraint would likely cause the scores associated with other rules to change slightly, but might not reduce the overall accuracy of SA in the mass check corpus (perhaps you're in some kind of local minimum?) Yeah, we've noticed that -- if I recall correctly, generally it *doesn't* seem to work out better to constrain them; possibly because the BAYES_99 spam is already hitting many other rules. The score generation tries to minimise rule scores without losing hits, to avoid FPs having major effects. I think we tried locked-down BAYES scores, and found *lower* overall accuracy figures. I'm not certain, though... - --j. I hope this makes sense. I'd be very interested in hearing about other experiences with this. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Alan Schwartz [EMAIL PROTECTED] Author/Co-author of: Managing Mailing Lists, SpamAssassin, Stopping Spam, and Practical Unix Internet Security, 3rd Ed Published by O'Reilly Media, Inc. (http://www.oreilly.com) -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBQQDIQTcbUG5Y7woRAit3AKDqtZpmU+8sOJOM7if0uBpqcR3eZgCfTJhN hwCJk16py5hr7wNEsL1U6OI= =kcP1 -END PGP SIGNATURE-
Re: svn commit: rev 43640 - spamassassin/trunk
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [EMAIL PROTECTED] writes: +If you use Debian, you can get Storable from the libstorable-perl +package. might be better to just have: Debian: apt-get install libstorable-perl Fedora: apt-get install perl-Storable since I can see other OS equivalents getting added there... - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBQRfwQTcbUG5Y7woRAozZAKCPNkTwav0pQn+VHlK6mZ4BQuIK/ACeJX8n 2v9CUwfhSAuy5xN+koWLlaA= =GFJI -END PGP SIGNATURE-
Re: svn commit: rev 43688 - spamassassin/trunk/spamd
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Isn't the SYNOPSIS used for spamd's usage message? If so, I think removing the defaults from that message is a bad thing... - --j. [EMAIL PROTECTED] writes: Author: mss Date: Fri Sep 10 13:11:49 2004 New Revision: 43688 Modified: spamassassin/trunk/spamd/spamd.raw Log: POD fix: Truncated SYNOPSIS lines to 80 chars. Mostly by removing the default values -- they are in the long description of the option and adding default values for *all* description would make the SYNOPSIS long and complex. So everybody who wants to know the defaults should read the man page. (I'm thinking about breaking the SYNOPSIS into several parts to make it less confusing.) Modified: spamassassin/trunk/spamd/spamd.raw == --- spamassassin/trunk/spamd/spamd.raw(original) +++ spamassassin/trunk/spamd/spamd.rawFri Sep 10 13:11:49 2004 @@ -1912,23 +1912,25 @@ -c, --create-prefs Create user preferences files -C path, --configpath=path Path for default config files - --siteconfigpath=path Path for site configs (def: /etc/mail/spamassassin) + --siteconfigpath=path Path for site configs -d, --daemonizeDaemonize -h, --help Print usage message. - -i [ipaddr], --listen-ip=ipaddrListen on the IP ipaddr (default: 127.0.0.1) - -p port, --portListen on specified port (default: 783) - -m num, --max-children=num Allow maximum num children (default: 5) - --max-conn-per-child=numMaximum connections accepted by child before exiting + -i [ipaddr], --listen-ip=ipaddrListen on the IP ipaddr + -p port, --portListen on specified port + -m num, --max-children=num Allow maximum num children + --max-conn-per-child=numMaximum connections accepted by child +before it is respawned -q, --sql-config Enable SQL config (only useful with -x) -Q, --setuid-with-sql Enable SQL config (only useful with -x, enables use of -H) --ldap-config Enable LDAP config (only useful with -x) --setuid-with-ldap Enable LDAP config (only useful with -x, enables use of -a and -H) - --virtual-config-dir=dir Enable pattern based Virtual configs (needs -x) + --virtual-config-dir=dir Enable pattern based Virtual configs +(needs -x) -r pidfile, --pidfile Write the process id to pidfile - -s facility, --syslog=facility Specify the syslog facility (default: mail) - --syslog-socket=type How to connect to syslogd (default: unix) + -s facility, --syslog=facility Specify the syslog facility + --syslog-socket=type How to connect to syslogd -u username, --username=username Run as username -v, --vpopmail Enable vpopmail config -x, --nouser-configDisable user config files @@ -1938,7 +1940,7 @@ -D, --debugPrint debugging messages -L, --localUse local tests only (no DNS) -P, --paranoid Die upon user errors - -H dir, --helper-home-dir=dir Specify a different HOME directory, path optional + -H [dir], --helper-home-dir[=dir] Specify a different HOME directory --ssl Run an SSL server --server-key keyfile Specify an SSL keyfile --server-cert certfile Specify an SSL certificate -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBQodkQTcbUG5Y7woRAigbAKDAfa9V1uST0Up50L5SnHzqo04k3gCfb91y p0vFVDaEiCHqHQXRvLi1Es8= =eURH -END PGP SIGNATURE-
class renaming
So in thinking about the class cleanup we've been wanting to do for a while; I think the top items on the list (my list at least) are: - rename the Mail::SpamAssassin::PerMsgStatus class - break it up into multiple, smaller classes So here's what I propose for the first one. Rename the Mail::SpamAssassin::PerMsgStatus class - Initially the purpose was as a per-message status object, describing the results of a scan of one message -- in other words, that message's spam status -- is it spam or not?. I think we all now agree the name is not so hot ;) It's purpose has eventually turned out to be: - (public) methods that actually cause the scan to happen - (public) the results of a scan operation - (public) message rewriting functionality - (internally, plugins) state for a scan operation; plugins and our code can store state on this object for the duration of the scan - (plugins) a set of APIs to access aspects of the message being scanned, and plugin support APIs - (internally) methods that implement Eval tests - (internally) methods that control how tests are run, their ordering etc. - (internally) methods that implement the DNS event-driven algorithm - (internally) methods that perform auto-learning - (internally) methods that compile tests into perl bytecode at runtime - (internally) methods that parse aspects of the message - (internally) the tests compiled as perl bytecode So in my opinion, in cases like this where there's lots of internal and external APIs, it's more sensible to name the class after what it's external APIs do. (in fact, most OO design would indicate that this means you need to refactor out into 1 class. I'm getting to that ;) So, I think Mail::SpamAssassin::Scan is a better name -- the object returned from M::SpamAssassin::check() is the results of a scan of a single message. (Scanner is another poss, but I think Scan is better because we aren't returning the object that *did* the scan, we're returning the *results* of the scan.) The next thing is backwards compatibility. We can only do this if we don't break third-party code. We *can* rename *this* class without breaking backwards compatibility, thankfully. Our requirements here are: - 1. plugins and third-party perl code will very likely contain use Mail::SpamAssassin::PerMsgStatus; lines, so having some kind of useable file there, is a MUST. - 2. there are possibly locations in third-party code where a Mail::SpamAssassin::PerMsgStatus object is created other than through the Mail::SpamAssassin::check() API, so being able to support that is a SHOULD. - 3. However, the majority of callers should not be creating PerMsgStatus objects directly, or depending in any way on the object being of that specific type. (hooray: perl's not strongly typed! ;) Here's how I propose to do that: - rename the current Mail::SpamAssassin::PerMsgStatus class to Mail::SpamAssassin::Scan - create a Facade Mail::SpamAssassin::PerMsgStatus object that is a sub-class of ::Scan, with no additional methods or data. in other words, all method calls and member var accesses will fall through into ::Scan. - If we deprecate any 3.0.0 APIs in the 3.1.0 cycle, we can move their backwards compatibility methods into that facade class, because 3.1.0 code will be Scan-native. - keep the facade object around for a while, at least until the next major cycle, because it's super-cheap; we won't even have a use line for it in our code, so it'll take up roughly 200 bytes on disk and that's it. Sound useful? in my opinion this is definitely useful in the 3.1.0 tree. Break it up into multiple, smaller classes -- OK, part the second. in my opinion, this is also a very good idea -- as the XP guys say, PerMsgStatus has a bad code smell -- it's a big file with lots of totally different functionality mushed into one class. In fact, it even loads methods from *multiple* files, which is totally nasty. ;) Here's some more details about what APIs are on the ::PerMsgStatus object (or the ::Scan object as it may be renamed): - (public) methods that actually cause the scan to happen check This should be left as a public API, but its code moved to a new class. see (internally) methods that control how tests are run, their ordering etc below. - (public) the results of a scan operation is_spam get_names_of_tests_hit / get_names_of_subtests_hit get_score / get_hits get_required_score / get_required_hits get_autolearn_status / get_report get_content_preview finish These are the main thing that the ::Scan object does, so they stay. - (public) message rewriting functionality rewrite_mail move code into another class; leave this public API on the ::Scan object
Re: [Bug 3825] Unescaped '#' in rawbody causes havoc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: I think it would be better if we did not allow end-of-line comments and required all comments to match: /^\s*#/ Then comments don't need to be escaped. I think that would involve less surprise and also solves the problem. I don't think this is purely a documentation problem. That would be a major change in how our configuration files are parsed, breaking a documented (although not particularly clearly) convention that's been there since the project began. It's also inconsistent with the convention for this configuration file format. the escaped-hash thing works fine (and I've used it myself at times), and just needs to be documented. I'm not keen on that at all: -1 - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBWJ0BQTcbUG5Y7woRAumpAKCFWgpeXutRdBr63WHWt4RN0XTGJACfRxMi 1NwN6SarUBc4JLWd/825vsM= =UaZi -END PGP SIGNATURE-
Re: [Bug 3825] Unescaped '#' in rawbody causes havoc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: Justin Mason [EMAIL PROTECTED] writes: That would be a major change in how our configuration files are parsed, breaking a documented (although not particularly clearly) convention that's been there since the project began. It's also inconsistent with the convention for this configuration file format. It would be better if our parser detected invalid lines rather than outputting perl errors due to us parsing garbage. That's my main concern. I am actually fine with requiring # to be escaped. My main concern is the non-clarity of the error statements, documentation does not fix that. ah, gotcha. sorry, I'm a bit tired today so comprehension's not quite at full speed. (just back from Toorcon.) (Side note: although not a requirement for this, getting rid of EOL comments would make this easier if it was coupled with a requirement that # be escaped.) y'see , I think that's the red herring -- there are many other ways to screw up the syntax of rules, e.g. rawbody HAS_RED_BODY_BG /body bgcolor=[']/f/i would similarly produce horrible perlish syntax errors, and there's no hashes involved there at all. BTW if you can wait a little bit, I have a patch from McAfee's tree that does this nicely if I recall correctly -- it catches compile-time errors in the rules and outputs a decent error message warning of a syntax error in that rule, by name. ;) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBWKJHQTcbUG5Y7woRAphnAJ4pb0SyEJqrGSCMg20L/LNZQ8o8cACg3R0E 626z2cpHF+QO6m8JQxpdSl0= =VgYh -END PGP SIGNATURE-
Re: class renaming
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Malte S. Stretz writes: On Sunday 26 September 2004 10:42 CET Daniel Quinlan wrote: [...] Do we really need to do this now? This is not going to significantly help performance, accuracy, or memory usage, is it? As much as I loved to have this thing renamed, why didn't we do this *before* we released 3.0? Or to quote you from bug 3668: there's *no way* I'd be happy making any of these changes before 4.0.0 ;) (Actually, the no way is exaggerated but I don't like the idea at this point). Well, that's a different kettle of fish -- bug 3668 is changing configuration file paths, this is changing a class name, and ensuring that backwards compatibility is preserved for that change. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBWK8YQTcbUG5Y7woRApXZAJ44uU8QE6pAgG9p6I5BYcsUgnheJACfcrW+ nUz/HYPlrE1qJj3B32nQq7g= =mbcS -END PGP SIGNATURE-
Re: [Bug 3821] scores are overoptimized for training set
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Loren Wilton writes: BTW, this is the rule reliability tflag idea again; basically provide a way to hint that this rule is reliable, and this rule should not be considered reliable -- no matter what their hit-rates in mass-checks were. I agree it may have good effects as a hint to the Perceptron, so it may now be time to do this. what d'you think, Henry? Note that Bob M. has a hint comment of his own that gives several levels of hint, not just a binary value. He uses this for his own scoring tool with good results. I think that the idea of a multi-level hint is a good one and should be considered. I don't know if that concept will fit in tflags. If not, perhaps some other (scorehint) could be cconsidered. yeah -- definitely -- I was thinking that, although I didn't mention it. ;) imo a new config command (I was thinking reliability or similar) would be good. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBWMCZQTcbUG5Y7woRAu8KAKDvZuLSPDziv73jJ0vuB6tJckagwQCgk4cI QtCGKENa11sgPI9zme5ma3M= =Wvfm -END PGP SIGNATURE-
Re: class renaming
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: [EMAIL PROTECTED] (Justin Mason) writes: - (public) message rewriting functionality rewrite_mail move code into another class; leave this public API on the ::Scan object which calls into that class. Proposed class name: Mail::SpamAssassin::Scan::Rewriter? Rewriting should not be part of the Scan object. yes, actually, you're right. I'd propose that rewriting be part of a Mail::SpamAssassin::Format class. any particular reason for that name? - (internally) methods that implement Eval tests [entire contents of EvalTests.pm which do this horrible hack of putting themselves into the PerMsgStatus namespace] move code into another namespace. Eval tests use the PerMsgStatus object as $self, and since they're just functions, not objects themselves, that doesn't need to change -- they'd still get the ::Scan object as their first arg. Proposed namespace: Mail::SpamAssassin::Test::Eval? Just Mail::SpamAssassin::Tests.pm ? yeah, actually, why not. - (internally) methods that control how tests are run, their ordering etc. [parts of check] [parts of do_head_tests / etc. ] Definitely move. Proposed class: Mail::SpamAssassin::TestRunner? RunTests? Runner? Scanner? Shouldn't this just be part of Scan? This is the thing -- as Theo said, by moving to a new class, we can provide the ability to switch out implementations without having to change the class of the Scan object (ie. what gets returned to the user). Basically the key idea is that we're breaking it up by *what it does* and what it's semantics are: - Scan: object returned to user - [this class]: object that contains the algorithm and code to run whatever subset of the tests in whatever order And the idea is that all this logic shouldn't be in the simple results object we give back to the user. - (internally) methods that implement the DNS event-driven algorithm [entire contents of Dns.pm which do this horrible hack of putting themselves into the PerMsgStatus namespace] into Mail::SpamAssassin::TestRunner as above? I'd say this belongs in the EvalTests module, wherever it ends up. Hmm. not sure about that. the EvalTests module can be kept for just the eval tests that are defined; this is plumbing. In fact, it's more similar to the TestRunner chunk imo. There's about 650 lines of code there, too, which is a lot (for perl). - (internally) methods that perform auto-learning learn Proposed class: Mail::SpamAssassin::AutoLearn? (I don't think mushing into PerMsgLearner, Bayes, or Mail::SpamAssassin makes sense, so a new class would be better.) I think there's too much breaking up of stuff here. Bayes would be fine. yeah, OK, Bayes is probably good enough alright. Do we really need to do this now? This is not going to significantly help performance, accuracy, or memory usage, is it? What's the effect on stability? How does this affect our release cycle? ok, ok. it's not much use to any of those -- but the all mushed into one class-ness of PerMsgStatus is really driving me nuts ;)It's far from good OO design. And bad code smell is an indicator that there are inefficiencies there. I do have an idea for improving performance -- separate mail to follow. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBWNeIQTcbUG5Y7woRAljYAJ9jP0fX4MoLlSVzZgmYT8gmylA90wCfZgXw Zgj4vTKNAwhG6jQL7QAkkPU= =nPeb -END PGP SIGNATURE-
speedup for PerMsgStatus
OK, here's a trick I was thinking about. Currently we have these massive hashtable refs: $pms-{conf}-{rbl_evals} {head_tests} {body_tests} {scoreset}-[0,1,2,3] {tflags} Each of those is keyed by the name of the rule. Now the thing is, this is really wasteful - speed-wise (not really RAM-wise) -- just performing all those hash lookups! When a message is scanned, each of the _evals and _tests hashes are iterated over, extracting the rule name and rule text for every entry. In reality, we only need the rule text at this point, *not* the name. - We have about 700 rules - 99% of the time, any given rule will NOT fire, so we should speedup: foreach my $rulepat (@{all_rules_of_given_type}) { ... if ($whatever =~ /$rulepat/) { # hit! } # otherwise miss! } we should speedup the 'foreach', the rule-text fetch, and the 'miss'. note that we don't need to know the rule name until the rule gives us a hit! so I'm thinking that we should replace parts of this with arrays, using integer indexes, instead of hashes with string indexes. Array lookups are quite a bit faster than hash lookups. Each array would have RAM usage of -- guessing -- (size_of_whats_stored + 9100) bytes, since arrays in perl have an overhead of about 13 bytes per entry. (this is about the same as hashes iirc, poss a bit less. not sure if there'd be RAM savings there, since perl hash keys are refcounted shared strings iirc.) we can optimize for the rules that are loaded from the system-wide config, because (a) allow_user_rules is almost always off, and (b) even if it's on, I'd guess that most times 99% of the rules that a scan runs would be system-wide rules anyway. (we can deal with user-rules by just pushing them onto the rules array when they're defined, same as the system rules are done.) --j.
Re: speedup for PerMsgStatus
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Loren Wilton writes: I have no idea how painful linked lists are in Perl (or if they even exist). Why are you commenting then??? Because they are very useful, as I pointed out. They don't exist as a native data structure. Arrays are fast, painless, and dynamically sized. They don't exist as a native data structure in C++ either. But they get a lot of use. Even when template classes exist to do reasonably fast and reasonably painless dynamic arrays. For certain things (like collections of objects that can get reordered frequently) they are generally more efficient than dynamic arrays. If there is an SA coding requirement for only using native data structures, then forget lists. If no such requirement exists and there is an interest in optimizing performance, then they should be a tool to be considered. Unfortunately, perl speed optimisation doesn't work like that. The reason is that perl native data structures (arrays, hashes, strings, numeric SVs, etc.) can be looked up in one perl OP, but a user-defined data structure cannot. The OP is the lowest level command in the perl VM, equivalent to an assembly opcode, and as such is very very fast -- since the innards of an OP is pure C. That's why regexp matching in perl is as fast as it is in C -- because a regexp match is compiled to a single OP. (Perl's not like Java in that respect. Perl's vm has quite high-level opcodes, whereas java's is more like real assembly and more low-level. that's why perl is faster than java ;) Unfortunately when reading fields in a perl data structure like a hash or array, and traversing reference chains, each variable access, and ref derefence, is an individual OP. So the upshot is that using a native perl data type will always be faster than defining a new non-native data type structure in perl. cf. http://www.ccl4.org/~nick/P/Fast_Enough/#ops_are_bad,_m%27kay for more details... in fact, I'm even considering looking into some use of pack() here for the very reasons noted here ;) (ps. I'm sure if I got any of that wrong Matt will correct me ;) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBWQRrQTcbUG5Y7woRAvGHAJwOAxmPKpX09LoiZBCsYypL5UzA2ACgvbTm 6uB3igI7ObXF+vn+jeOmN98= =cQEI -END PGP SIGNATURE-
Re: class renaming
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 As much as I loved to have this thing renamed, why didn't we do this *before* we released 3.0? Or to quote you from bug 3668: there's *no way* I'd be happy making any of these changes before 4.0.0 ;) (Actually, the no way is exaggerated but I don't like the idea at this point). Well, that's a different kettle of fish -- bug 3668 is changing configuration file paths, this is changing a class name, and ensuring that backwards compatibility is preserved for that change. That other bug was also about changing something newly introduced where we wouldn't have to watch out for backwards compatilility :) Whatever, what I wanted to say is that I'm not opposed to the idea itself and especially if it has any speed and memory advantages I'm all for it. I'm just afraid that such a major change at this early point might brake at some unexpected place as much as we try to stay backwards-compatible. Yeah, I think at this point we have 3 devs saying -1, so I don't think it's going to happen anyway ;) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBWz1TQTcbUG5Y7woRAqWUAJ9s42vW4bfMzCXb8ZbrxLGkr2/yvwCffIqm o8S977wFZaCeqR3WwjKe4TQ= =vSnU -END PGP SIGNATURE-
Re: svn commit: rev 47510 - spamassassin/trunk
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Michael Parker writes: On Wed, Sep 29, 2004 at 10:21:06PM -, [EMAIL PROTECTED] wrote: +- MIMEDefang: version 2.42 or later. FWIW, I completely disagree with doing this. A) It will give the impression that we support these programs (I assume there will eventually be more), B) How are we verifying that the version listed actually works? C) Is someone going to test every single release against each program we have listed to make sure the information is still valid? D) What criteria are we using to decide which programs get listed? (A) well, we *do* to a degree [*] (B) what users/devs of those tools report on the list (C) no (D) the volume of traffic from people asking these questions [*]: SpamAssassin is NOT just a mail filter. It's also a suite of perl modules to perform spam identification inside other mail filters. amavisd, MIMEDefang et al are therefore supported products into which SpamAssassin can be plugged. Therefore we have to consider what documentation will help people who use those apps in using SpamAssassin. Having said all that, I'd be +1 on taking that out of UPGRADE, replacing with a pointer to a wiki page which contains that info. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBWz4wQTcbUG5Y7woRAoq6AKC5uxOr8o6AjxcLZovVxZSPnsUcKgCfcobU XzC6ZAT0rSshWXef5lIjlow= =r7+0 -END PGP SIGNATURE-
Re: svn commit: rev 47516 - spamassassin/trunk
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Malte S. Stretz writes: Why I sort that file now and then is because it makes it much easier to see if a file is already in there or remove one which is gone. Keeping the MANIFEST up-to-date is already a PITA and an unsorted file makes it even worse (ok, there are grep and friends but I think its faster to scan the file with your eyes instead of calling some command). make distcheck works for me ;) make disttest is also useful -- if a file is missing, it should cause a test to fail anyway. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBXDeIQTcbUG5Y7woRAjWCAJ4oFDL80ZRaNoLEeVUjEOpNCU4CRACfaSCD NvF+wmqPaQ7UfrSvdIT7Lg8= =n0Qq -END PGP SIGNATURE-
Re: svn commit: rev 47516 - spamassassin/trunk
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Malte S. Stretz writes: On Thursday 30 September 2004 18:42 CET Justin Mason wrote: Malte S. Stretz writes: Why I sort that file now and then is because it makes it much easier to see if a file is already in there or remove one which is gone. Keeping the MANIFEST up-to-date is already a PITA and an unsorted file makes it even worse (ok, there are grep and friends but I think its faster to scan the file with your eyes instead of calling some command). make distcheck works for me ;) make disttest is also useful -- if a file is missing, it should cause a test to fail anyway. Yeah, they are useful but do you call them after (or better: before) each commit? I do so before each bigger change but for small things I often simply forget it (or avoid it because it can take ages). before every commit where you've added or removed files. no question of that ;) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBXD9iQTcbUG5Y7woRAu5fAKCtEf030gQrTrtfFXtXui8uxxeXLQCg4Wx8 8e3RNo6Qxmg6U/+K2rcOcpQ= =ZFBz -END PGP SIGNATURE-
Re: [Bug 3848] SA 3.0 time outs with amavis+razor
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 actually, you're right on both; I just checked with perl -e in perl 5.8.4. I must have been thinking of java instead of perl ;) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBXEEhQTcbUG5Y7woRApMlAJ4ykLqTSFEDQAwqRAlyLO1wP/q2lACgp9zn OMvd703Ss/p7/n3lSrbgRz8= =wo5U -END PGP SIGNATURE-
Sequence analysis/bioinformatics
A very interesting paper at Toorcon -- the use of bioinformatics techniques to perform black-box protocol reverse-engineering. Again, this is likely to be useful for automated discovery of antispam regexp rules... worth a read: http://www.baselineresearch.net/PI/PI-Toorcon.pdf --j.
Re: svn commit: rev 51805 - spamassassin/trunk
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [EMAIL PROTECTED] writes: Author: mss Date: Sat Oct 2 08:29:31 2004 New Revision: 51805 Modified: spamassassin/trunk/Makefile.PL Log: Just for fun... what does this do? could we get some more descriptive commit messages, and possibly some discussion before the top-level Makefile.PL is changed like this? also, how does make manifest update the manifest? The whole idea of that file is that it is *manually* maintained, not automatically, to avoid accidental inclusion of built files. - --j. Modified: spamassassin/trunk/Makefile.PL == --- spamassassin/trunk/Makefile.PL(original) +++ spamassassin/trunk/Makefile.PLSat Oct 2 08:29:31 2004 @@ -198,7 +198,10 @@ 'dist' = { COMPRESS = 'gzip -9f', SUFFIX = 'gz', -DIST_DEFAULT = 'tardist' +DIST_DEFAULT = 'tardist', + +CI = 'svn commit', +RCS_LABEL = 'true', }, 'clean' = { FILES = join(' ' = -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBXvPoQTcbUG5Y7woRAimiAJsFP/LZrYOOUsHTj4Df4tGGnwUu6QCfVQHU 9tbF+n0sjUId/8UkeHxcUcQ= =Yz6h -END PGP SIGNATURE-
Re: svn commit: rev 53755 - spamassassin/trunk/lib/Mail/SpamAssassin/Plugin
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 BTW, note that plugins *should* be able to push their own entries onto the $conf-{registered_commands} list. That is, in my opinion, much cleaner than the current parse_config() API, and may be worthwhile as a way for future plugins to do configuration. May need a little work, though ;) - --j. [EMAIL PROTECTED] writes: Author: felicity Date: Mon Oct 4 15:16:21 2004 New Revision: 53755 Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Hashcash.pm spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Razor2.pm Log: the hashcash and razor2 plugins use the standard parser functions to set values from the configuration. however since there's no way to deal with the errors in a standard manner right now (see bug 3869), set a standard-ish function in the plugin itself to deal with issues. basically the same code as the parser itself. Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Hashcash.pm == --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Hashcash.pm (original) +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Hashcash.pm Mon Oct 4 15:16:21 2004 @@ -68,6 +68,7 @@ my $conf = $opts-{conf}; my $key = $opts-{key}; my $value = $opts-{value}; + my $line = $opts-{line}; =over 4 @@ -78,7 +79,11 @@ =cut if ( $key eq 'use_hashcash' ) { -$conf-{use_hashcash} = $value+0; return 1; +$self-handle_parser_error($opts, + Mail::SpamAssassin::Conf::Parser::set_numeric_value($conf, $key, $value, $line) +); +$self-inhibit_further_callbacks(); +return 1; } =item hashcash_accept [EMAIL PROTECTED] ... @@ -100,7 +105,9 @@ =cut if ( $key eq 'hashcash_accept' ) { -$conf-add_to_addrlist ('hashcash_accept', split (/\s+/, $value)); return 1; +$conf-add_to_addrlist ('hashcash_accept', split (/\s+/, $value)); +$self-inhibit_further_callbacks(); +return 1; } =item hashcash_doublespend_path /path/to/file (default: ~/.spamassassin/hashcash_seen) @@ -116,7 +123,11 @@ =cut if ( $key eq 'hashcash_doublespend_path' ) { -$conf-{hashcash_doublespend_path} = $value; return 1; +$self-handle_parser_error($opts, + Mail::SpamAssassin::Conf::Parser::set_string_value($conf, $key, $value, $line) +); +$self-inhibit_further_callbacks(); +return 1; } =item hashcash_doublespend_file_mode(default: 0700) @@ -130,11 +141,47 @@ =cut if ( $key eq 'hashcash_doublespend_file_mode' ) { -$conf-{hashcash_doublespend_file_mode} = $value+0; return 1; +$self-handle_parser_error($opts, + Mail::SpamAssassin::Conf::Parser::set_numeric_value($conf, $key, $value, $line) +); +$self-inhibit_further_callbacks(); +return 1; } return 0; } + +sub handle_parser_error { + my($self, $opts, $ret_value) = @_; + + my $conf = $opts-{conf}; + my $key = $opts-{key}; + my $value = $opts-{value}; + my $line = $opts-{line}; + + my $msg = ''; + + if ($ret_value $ret_value eq $Mail::SpamAssassin::Conf::INVALID_VALUE) { +$msg = config: SpamAssassin failed to parse line, . + \$value\ is not valid for \$key\, . + skipping: $line; + } + elsif ($ret_value $ret_value eq $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE) { +$msg = config: SpamAssassin failed to parse line, . + no value provided for \$key\, . + skipping: $line; + } + + return unless $msg; + + if ($conf-{lint_rules}) { +warn $msg.\n; + } else { +dbg($msg); + } + $conf-{errors}++; + return; +} ### Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Razor2.pm == --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Razor2.pm (original) +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Razor2.pm Mon Oct 4 15:16:21 2004 @@ -87,7 +87,9 @@ =cut if ($key eq 'razor_timeout') { -Mail::SpamAssassin::Conf::Parser::set_numeric_value($conf, $key, $value, $line); +$self-handle_parser_error($opts, + Mail::SpamAssassin::Conf::Parser::set_numeric_value($conf, $key, $value, $line) +); $self-inhibit_further_callbacks(); return 1; } @@ -100,13 +102,48 @@ =cut if ($key eq 'razor_config') { -Mail::SpamAssassin::Conf::Parser::set_string_value($conf, $key, $value, $line); +$self-handle_parser_error($opts, + Mail::SpamAssassin::Conf::Parser::set_string_value($conf, $key, $value, $line) +); $self-inhibit_further_callbacks(); return 1; } return 0; } + +sub handle_parser_error { + my($self, $opts, $ret_value) = @_; + + my $conf = $opts-{conf}; + my
Re: improving SURBL without the foot-shooting
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kelsey Cummings writes: On Tue, Oct 05, 2004 at 03:25:55AM -0700, Jeff Chan wrote: 4. SURBL query traffic mostly good if you subtract the blacklisted ones But any big, as-yet-undetected spam domains can also generate much traffic. What if you were to have a friendly ISP that would be willing to send you an anonymized data feed that looked something like: sa scoretabspam/hamtaburltaburltaburl\n It wouldn't be very hard to send this information in realtime. funnily enough, I have some IPC::DirQueue code to do this in a low-impact, low-load manner ;) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBYza+QTcbUG5Y7woRAnKDAKDSpmPVgnBeEk12LdKzjxc5I8Z0RACfTCS1 IOQYMqqh1RyTvuCTb0LZnqo= =pc3H -END PGP SIGNATURE-
Re: improving SURBL without the foot-shooting
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kelsey Cummings writes: On Tue, Oct 05, 2004 at 05:05:18PM -0700, Justin Mason wrote: Kelsey Cummings writes: On Tue, Oct 05, 2004 at 03:25:55AM -0700, Jeff Chan wrote: 4. SURBL query traffic mostly good if you subtract the blacklisted ones But any big, as-yet-undetected spam domains can also generate much traffic. What if you were to have a friendly ISP that would be willing to send you an anonymized data feed that looked something like: sa scoretabspam/hamtaburltaburltaburl\n It wouldn't be very hard to send this information in realtime. funnily enough, I have some IPC::DirQueue code to do this in a low-impact, low-load manner ;) I was actually thinking the easiest way to pass this data in realtime would be to send it to surbl's colo at sonic via syslog. SA can already generate it and syslogd can write to a named pipe for processing. Makes it easy to get running. well, that's true! didn't think of that. But, IPC::DirQueue is useful. Taking a queue from it I rewrote all of my spam processing stuff to operate as a Maildir client. A single thread has proven to be fast enough and alot better than passing to each processing bit via procmail. If I find that it needs more than one processing thread to keep up I'll probably go steal lots of your code. :-p that's the thing -- it's designed so that if you need more threads, just start more processes. you don't even need to synchronize them externally, it does it itself! (ps: did you notice I put up a version that does the hashing thing you suggested?) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBYzs1QTcbUG5Y7woRAodqAJ98/Q7IfuXdGpY2s+GKzzXjr4mmjgCfUa45 QnSi8VGgTtczz4IgubtH+gs= =xqNK -END PGP SIGNATURE-
Re: What's up with reviewing tickets?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Theo Van Dinter writes: There are currently 5 (of 8) tickets in the 3.0.1 queue in the review state. One has been in review state since 9-29 (3831) and needs another +1, 3872 is major (needs another +1), 3741 and 3865 had patch added today, and 3806. I'd like to get 3.0.1 out either this week or next, BTW. agreed -- slow reviewing is not a good thing... all the patches are quite simple and amenable to visual review. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBafLEMJF5cimLx9ARAiFsAJ9xfhIHMG5klme53i7ppxWgyjJS3gCgunJF e8nXmjCcJlRohOEIwUK4mA8= =B7ol -END PGP SIGNATURE-
Re: limit on number of URIs decoded?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sidney Markowitz writes: Justin Mason wrote: The first fix is truncation of the text before passing to TextCat. Michael, I think you were looking at this? the results are impressive, if the text is truncated to 32k bytes: It was me. oops! sorry ;) I've been looking at ways to not have to create so much garbage (I'm a lisp hacker -- I'm not using the word in the pejorative sense) in that loop in create_lm, but the simplest way of dealing with it this is to truncate $input to perhaps 10,000 bytes in the call to create_lm. Since TextCat is just a heuristic for determining the language and there is no incentive for spammers to, for example, prefix a Spanish language message with 10,000 bytes of English words just to slip through the spam filters of English-only speakers, the first 10,000 bytes is plenty as a limit. Language recognition accuracy does not improve noticeably past one or two thousand characters, while going to less than 10,000 does not provide much additional speed or memory benefit. If there is no real language text in the first 10,000 characters of rendered body, then it will not be recognized as any language and the rule will not fire, failing safely. I propose putting in the truncate for 3.0.1 as a quick and safe way of around the problem we saw with that malformed MIME message. I'll keep playing with the loop just in case I can speed it up enough for the 3.1 time frame to not have to truncate, but we should do the quick fix right away. +1 on truncation. I think it's safe for 3.1.0 as well, fwiw. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBbMg7MJF5cimLx9ARAtFHAJ9USbLtlALQNyPh2zO8vY7Ij8iK9wCguY/9 AGySenolwH+E8IPoMDPlXN0= =nsK7 -END PGP SIGNATURE-
Re: svn commit: rev 54716 - in spamassassin/trunk: . t
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sidney Markowitz writes: Justin Mason wrote: the test should be a no-op without that module did that not work? This is extracted from output of make test, running under Cygwin with perl 5.8.5 t/memory_cycles.Can't locate Devel/Cycle.pm in @INC (@INC contains: t . ../blib/lib /c/sasvn/trunk/blib/lib /c/sasvn/trunk/blib/arch /usr/lib/perl5/5.8.5/cygwin-thread-multi-64int /usr/lib/perl5/5.8.5 /usr/lib/perl5/site_perl/5.8.5/cygwin-thread-multi-64int /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.5/cygwin-thread-multi-64int /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl) at t/memory_cycles.t line 66. BEGIN failed--compilation aborted at t/memory_cycles.t line 66. oops. try current svn... r54765 should fix it... - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBbc+mMJF5cimLx9ARAuVnAJ4xz4LDlgaKhwiCwEq86PLmp1xwjwCgjdtZ y7K4FA/HB4B1emcrhelzBmI= =d25n -END PGP SIGNATURE-
Re: 3.0.1 this week?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Michael Parker writes: On Thu, Oct 21, 2004 at 12:01:20AM -0400, Theo Van Dinter wrote: I'd like to get 3.0.1 released in the next few days. There are 2 tickets left in the queue: can we get them done up in the next day or so? +1 on a release soon. +1 here too. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBd0lZMJF5cimLx9ARAqm4AJ0cZqTQ/N3CiKHl3+cyQP466DgaiQCgm09x JoIjMy6GUhnwwgnV2QpDtDw= =2UMQ -END PGP SIGNATURE-
Re: VOTE: release 3.0.1
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: I propose we release SpamAssassin 3.0.1. All bugs are closed now. +1 -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBeGTKMJF5cimLx9ARAnavAJ9hJN088VrH7LM1eHiPXr9DJ7xeLACght6V 3dqofd78+gOrQqUKyk5FLBs= =D2/q -END PGP SIGNATURE-
SpamAssassin 3.0.1 is released!
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 SpamAssassin 3.0.1 is released! 3.0.1 contains some important bugfixes, and is recommended. Highlights: - excessive memory-usage fixes - bug fixed which stopped DCC, Pyzor working with amavisd - deprecate RCVD_IN_RFC_IPWHOIS - user_prefs were staying active between different spamd users, fixed - user_prefs blacklist entries were not working in spamd, fixed - excessive time and memory consumption when ok_languages is used, fixed - sa-learn -u switch to specify the username for virtual environments - avoid bug in Sys::Hostname::Long that renames the hostname when make test is run - whitelist the top 125 queried SURBL domains common in nonspam Pick it up at http://spamassassin.apache.org/ ! md5sum of archive files: 83f60f97c823d9b8df19309247fe33eb Mail-SpamAssassin-3.0.1.tar.bz2 759e0486b07c4a03aa340d4a04e1d849 Mail-SpamAssassin-3.0.1.tar.gz e42d4f6b7228f899efdfdce03b8851a0 Mail-SpamAssassin-3.0.1.zip sha1sum of archive files: 7ad929efc388ebdf26da052c6fca958c7541bb4f Mail-SpamAssassin-3.0.1.tar.bz2 a3aebae1bf3c97830e540c42dc64791787d966c9 Mail-SpamAssassin-3.0.1.tar.gz e4f23ad8251914bb240a4e42438310a263ca5056 Mail-SpamAssassin-3.0.1.zip The release files also have a .asc accompanying them. The file serves as an external GPG signature for the given release file. The signing key is available via the wwwkeys.pgp.net key server, as well as http://spamassassin.apache.org/released/GPG-SIGNING-KEY The key information is: pub 1024D/265FA05B 2003-06-09 SpamAssassin Signing Key [EMAIL PROTECTED] Key fingerprint =3D 26C9 00A4 6DD4 0CD5 AD24 F6D7 DEE0 1987 265F A05B - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBectWMJF5cimLx9ARAh2DAKCBru7brC0dtjD4G2/QGvAmWntURgCgoKBp J1C/3vGNxtuJcxuosscN+E4= =RAAd -END PGP SIGNATURE-
Re: svn commit: rev 55350 - spamassassin/site
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 -font-family: verdana,lucida,helvetica,sans-serif; +font-family: arial,helvetica,sans-serif; just to reiterate -- I'm -1 on this change. It looks awful by comparison (where Verdana is available), at least under Firefox on linux. Some discussion and agreement is essential before changing branding elements like this! - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBedkCMJF5cimLx9ARAiCyAKCdbh2AeOaig0yqFM886loey609gACfZq7A lf5tTovLid57Xy605pAnkRE= =0Jj8 -END PGP SIGNATURE-
3.0.1 /dist/ area screwups
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Theo Van Dinter writes: On Fri, Oct 22, 2004 at 08:09:10PM -0700, Justin Mason wrote: SpamAssassin 3.0.1 is released! 3.0.1 contains some important bugfixes, and is recommended. Another couple of notes about the release. Apparently the dist/spamassassin/source files for 3.0.0 were removed -- so the only version available for download now is 3.0.1. Don't we want to keep the older version(s) available for at least some period of time? This is going by what the ASF guidelines for usage of the mirrored www.apache.org/dist/ say *must* be done. see http://www.apache.org/dev/mirrors.html , http://httpd.apache.org/dev/release.html , http://cvs.apache.org/~bodewig/mirror.html , http://jakarta.apache.org/site/convert-to-mirror.html . However, I think I agree -- leaving the old versions there for a short while makes more sense. Take a read over those and see what you think. The fundamental problem this time around was that I miscomputed that ?update parameter. we should create a simple build script that generates the correct value for us to cut and paste and cut down on faulty brain-work. ;) There *is* another problem, though -- since the downloads.html/.cgi page is on the single un-mirrored site, and the downloads are on the mirrors which may be up to 24 hours out of sync, we would still have to use the ?update=200409211830 parameter on the downloads.cgi URL to ensure that only up-to-date mirrors are used; otherwise the download link will either - (a) if it points to Mail-SpamAssassin-3.0.1.tar.gz, return a 404 - (b) if it points to Mail-SpamAssassin-current.tar.gz, return the old file which will not match the checksums, and that's not good. Also, the dist/spamassassin/source files were removed, but not the symlinks to them in dist/spamassassin -- so there were 12 bad symlinks lying around. I've already received a complaint note about it, so I removed the bad symlinks. oops. my fault! we need to update build/README to reflect that. I really don't understand why we put the source files in the source directory, and then have symlinks for them all in the parent directory. Just put the source files in the parent directory! Again, ASF guidelines. It might be worth asking infrastructure@ if the guidelines can be ignored in this case... although I'm not sure there's a big win. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBfGCmMJF5cimLx9ARAiUjAJ43Mzilp/NpIkAlD/nPSbhm3cGqPACdHzSR tc6h+C3KAq2K9PCWvbW6M9M= =9cda -END PGP SIGNATURE-
Re: debug levels in trunk
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: was: Re: [Bug 3931] [review] remove the annoying 'inhibited further callbacks' debug message (a) new debug code in 3.1.0 doesn't have higher debug levels Really? That kind of sucks (although we never really used it anyway...) While we have debug levels in trunk ... - dbg() debugging message - info() informational message (okay to be logged by spamd always) - warn() something went very wrong - die() ouch! ... I agree that we do not need more verbose debugging levels than dbg(). I think more verbose than dbg() means you comment out the dbg() statement. :-) yeah, I really agree -- I have used higher debugging levels only *once*; for the RBL code. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBgBvBMJF5cimLx9ARAgtJAJwPD4JkBgSedM2nGJNshD0avFfqRgCgpTrt uj1va0rqYSVnZ8it5BYX8g0= =C8qn -END PGP SIGNATURE-
Re: [Bug 3940] ArchiveIterator uses opt_j for two different things
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I'd strongly prefer (I'm probably -1 on creatingh two new options for this one) to keep opt_j as the number of processes (it parallels make -j) and add a new option for the temporary file vs. in-memory option. The temporary file thing postdates -j by a long period and can just move to a new option. I think just adding a new option for storage is doable, but FWIW I don't really care about parallels ... -j since this is all internal API names. The commandline can stay the same, but unless you're used to the module, opt_j isn't very descriptive of what the value means. oh btw, on that point, I'd be very pro adding *new*, meaningful names for opt_j, opt_n et al as they are used in the M:SpamAssassin:ArchiveIterator class, and leaving opt_j, opt_n et al as backwards-compat aliases. I agree, they don't make much sense for users of that module apart from mass-check. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBhYJGMJF5cimLx9ARAnZWAKCFHqthUt9p7kCQJTqkLsAjBqXWTACZAcd+ lReIi6mhyf165yWgmgAmtJI= =BMAd -END PGP SIGNATURE-
Re: svn commit: rev 56270 - spamassassin/trunk/masses
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Log: work on the mass-check output a bit, state when scan has ended and run begins (rough approximation since the run has already begun at that point), format the lines better, etc. hey btw are we going to merge Duncan's mods? we really should, that code will rot otherwise. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBho24MJF5cimLx9ARAsOuAJ9Gs2Ge8AHpMRA0JB5gtPc4ToAbdwCbBxlb HabTXF7ghvmZNTFi0ZQu9U0= =/zhh -END PGP SIGNATURE-
proposal: an automated rule-qa system
So, we were discussing the rules situation -- ie. that we've been pretty crap at getting rules into the distro. I proposed this, and I think we're reasonably into the idea as a way to help out. We add a web-app somewhere that periodically scrapes bugzilla for bugs on the rules component which contain some token from trusted users indicating that they contain rules that need testing. That then extracts rules from attachments/text on that bug, and - (a) checks out SVN trunk - (a) adds them to the rules dir of that in a temporary file - (b) runs a mass-check on those rules - (c) does simple lint using spamassassin --lint and lint-rules-from-freqs - (d) does some kind of basic S/O testing - (e) it may be that we can also check in the rules into SVN for a full nightly mass-check from all the people doing those, in which case it should come up with the results from that, nicely snipped out of the full reports. - (f) if we do (e), we can even get the results, segmented by the age of the corpus used! in other words, give us a picture of the freqs based on how old the messages it was hitting on were. - (g) -- possibly -- do a quick perceptron run to evaluate if the rule overlaps with other rules too much. Finally, it'll display the results at a given URL -- probably based on the bug and comment numbers, so it's easily hyperlinkable. Using bugzilla as the backend is useful, btw, as that gives us - threaded discussion of rules - contributor CLA status tracking - good ways to get lists and overviews of what contributions are available and their status - gatewayed to mailing list, and viewable via www Sound useful? That should at least take some legwork out of rule QA, and stop us committers being a bottleneck in the process. --j.
Re: Java client to spamd
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kurt Humes writes: I am begining to build a Java Libray to act as a client to spamd, not using JNI however. Has anyone ever done something similar and if so what are the roadblocks that you have come across. Kurt, I'm unaware of anything, but it should be very, very straightforward. (only (minor) roadblock: there was a bug in whitespace handling at the end of the server response to one of the request verbs, can't rememmber which one, but it's documented in spamd/PROTOCOL.) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBnn5xMJF5cimLx9ARArbIAKCx/cCfhv0813QtyDF6lRC0zY9p+gCfcukJ 1R7sGioj2UFAVNc7PJ1ZkiY= =hAuU -END PGP SIGNATURE-
Re: [SpamAssassin Wiki] Updated: FrontPage
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 btw, I was thinking of keeping this link around, on the wiki at least, in case the slides became available... (hint hint ;) - --j. [EMAIL PROTECTED] writes: Date: 2004-11-19T21:55:15 Editor: DanielQuinlan [EMAIL PROTECTED] Wiki: SpamAssassin Wiki Page: FrontPage URL: http://wiki.apache.org/spamassassin/FrontPage remove conference link Change Log: -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBn8tnMJF5cimLx9ARApoaAJ4h7fl7vzdFrRMu4YVzu5nnwzT/0ACeIQf5 rAj4dJ0o869R0r4CZ+Mv4jM= =fZH0 -END PGP SIGNATURE-
TIP: very useful '%seen' trick
this just came up on perl5-porters... http://www.nntp.perl.org/group/perl.perl5.porters/96100 : Subject: Re: sharing hash-values From: btilly[at]gmail.com (Ben Tilly) ... I forgot who I first saw mention this, possibly gbarr, but the following variation on %seen seems to be the fastest in native Perl: my %seen; undef @[EMAIL PROTECTED]; for (@things) { if (exists $seen{$_}) { ... } } This avoids creating the hash values entirely. (Or at least it did a few revs of Perl ago.) Cheers, Ben sure enough, using the shared undef SV as the magic value is 7% faster and doesn't allocate the scalars to reduce RAM usage ;) definitely the better idiom. Benchmark: : jm 1122...; perl psc Rate traditional undef_keys traditional 100014/s -- -6% undef_keys 106684/s 7% -- script: #!/usr/bin/perl -w use Benchmark qw(:all); use strict; my @things = qw( foo bar baz foo foo foo bar bar baz baz blarg ); cmpthese (-2, { 'traditional' = sub { my $res = ''; my %seen; for (@things) { next if $seen{$_}; $seen{$_} = 1; $res .= $_\n; } }, 'undef_keys' = sub { my $res = ''; my %seen; # undef @[EMAIL PROTECTED]; for (@things) { next if exists $seen{$_}; undef $seen{$_}; $res .= $_\n; } } }); (ps: note the 'undef @[EMAIL PROTECTED];' -- can be used to undef a list of already-seen special values before the loop.) --j.
Re: svn commit: r106135 - /spamassassin/trunk/rules/20_head_tests.cf /spamassassin/trunk/rules/50_scores.cf
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 btw, would a name prefix sound like a good idea for a convention to indicate rules that exist to catch never-seen-in-the-wild spammer exploits? Something like EVIL or similar? it'd provide a great way to (a) visually see that those rules not firing is not a problem in hit-frequencies output, and (b) grep them out for the same purpose. - --j. [EMAIL PROTECTED] writes: Author: quinlan Date: Sun Nov 21 14:49:14 2004 New Revision: 106135 Modified: spamassassin/trunk/rules/20_head_tests.cf spamassassin/trunk/rules/50_scores.cf Log: promote T_FRAGMENTED_MESSAGE to FRAGMENTED_MESSAGE Modified: spamassassin/trunk/rules/20_head_tests.cf Url: http://svn.apache.org/viewcvs/spamassassin/trunk/rules/20_head_tests.cf?view=diffrev=106135p1=spamassassin/trunk/rules/20_head_tests.cfr1=106134p2=spamassassin/trunk/rules/20_head_tests.cfr2=106135 == --- spamassassin/trunk/rules/20_head_tests.cf (original) +++ spamassassin/trunk/rules/20_head_tests.cf Sun Nov 21 14:49:14 2004 @@ -27,6 +27,12 @@ header HEAD_LONG eval:check_for_long_header() describe HEAD_LONG Message headers are very long +# partial messages; currently-theoretical attack +# unsurprisingly this hits 0/0 right now. But should we promote it anyway +# to protect against the possibility? +header FRAGMENTED_MESSAGEContent-Type =~ /\bmessage\/partial/i +describe FRAGMENTED_MESSAGE Partial message + header MISSING_HB_SEPeval:check_for_missing_hb_separator() describe MISSING_HB_SEP Missing blank line between message header and body Modified: spamassassin/trunk/rules/50_scores.cf Url: http://svn.apache.org/viewcvs/spamassassin/trunk/rules/50_scores.cf?view=diffrev=106135p1=spamassassin/trunk/rules/50_scores.cfr1=106134p2=spamassassin/trunk/rules/50_scores.cfr2=106135 == --- spamassassin/trunk/rules/50_scores.cf (original) +++ spamassassin/trunk/rules/50_scores.cf Sun Nov 21 14:49:14 2004 @@ -619,10 +619,9 @@ # GTUBE - Generic Test for Unsolicited Bulk Email score GTUBE 1000.000 -# long header test +# we dare you score HEAD_LONG 2.5 - -# missing blank line between header and body +score FRAGMENTED_MESSAGE 2.5 score MISSING_HB_SEP 2.5 # HTML control test -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBoWTCMJF5cimLx9ARAh+3AJ9ecYONAcMjCwbioiqQM70kxBV4KwCgh4+A TG2qxiUfpF1l0YAMunQ07xY= =1OZx -END PGP SIGNATURE-
Re: svn commit: r106170 - /spamassassin/trunk/spamd/spamd.raw
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: [EMAIL PROTECTED] writes: [...] sub service_unavailable_error { my ($err) = @_; my $resp = EX_UNAVAILABLE; - print $client SPAMD/1.0 $resphash{$resp} Service Unavailable: $err\r\n; + syswrite( $client, SPAMD/1.0 $resphash{$resp} Service Unavailable: $err\r\n ); logmsg(service unavailable: $err); } Please try to use the more standard perl formatting: http://wiki.apache.org/spamassassin/CodingStyle Thanks! ah, the foo( bar ) vs. foo(bar) style issue ;) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBojZzMJF5cimLx9ARAnMvAKCBfg2Z0B/LDlRlnOW5eBXqkpJzgACbBiiB TvayyMOJ8XHjvmhxmKZy1p4= =NDK9 -END PGP SIGNATURE-
Re: svn commit: r106170 - /spamassassin/trunk/spamd/spamd.raw
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sidney Markowitz writes: Daniel Quinlan wrote: Please try to use the more standard perl formatting: Do you see anything wrong other than two of the lines being more than 80 characters? I'll check in an update to fix that as soon as I finish running a make test on the change. Sidney -- I think it's the foo( bar ) vs. foo(bar) style thing. Daniel prefers the latter -- no extra spaces after the bracket, and we've agreed to go with that. ;) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBojbwMJF5cimLx9ARAq2sAJ9n90D3h0q567L1ZD4GO9Fy9g9LGwCgmOoh k3AFTFQV8Z7dmCrpEbjDILE= =HsD8 -END PGP SIGNATURE-
Re: svn commit: r105955 - in spamassassin/trunk: . lib/Mail
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 reminder: do we have a consensus what to do about this? can we reinstate the functions in the meantime? - --j. [EMAIL PROTECTED] writes: Author: quinlan Date: Sat Nov 20 00:17:49 2004 New Revision: 105955 Modified: spamassassin/trunk/lib/Mail/SpamAssassin.pm spamassassin/trunk/spamassassin.raw Log: bug 3856: remove debug_diagnostics() from Mail::SpamAssassin Modified: spamassassin/trunk/lib/Mail/SpamAssassin.pm == --- spamassassin/trunk/lib/Mail/SpamAssassin.pm (original) +++ spamassassin/trunk/lib/Mail/SpamAssassin.pm Sat Nov 20 00:17:49 2004 @@ -1159,35 +1159,6 @@ ### -=item $f-debug_diagnostics () - -Output some diagnostic information, useful for debugging SpamAssassin -problems. - -=cut - -sub debug_diagnostics { - my ($self) = @_; - - foreach my $module (sort qw( -Net::DNS Razor2::Client::Agent MIME::Base64 -IO::Socket::UNIX DB_File Digest::SHA1 -DBI URI Net::LDAP Storable -)) - { -my $modver; -if (eval ' require '.$module.'; $modver = $'.$module.'::VERSION; 1;') -{ - $modver ||= '(undef)'; - dbg(diag: module installed: $module, version $modver); -} else { - dbg(diag: module not installed: $module ('require' failed)); -} - } -} - -### - =item $failed = $f-lint_rules () Syntax-check the current set of rules. Returns the number of Modified: spamassassin/trunk/spamassassin.raw == --- spamassassin/trunk/spamassassin.raw (original) +++ spamassassin/trunk/spamassassin.raw Sat Nov 20 00:17:49 2004 @@ -240,7 +240,6 @@ ); if ( $opt{'lint'} ) { - $spamtest-debug_diagnostics(); my $res = $spamtest-lint_rules(); warn lint: $res issues detected. please rerun with debug enabled for more information.\n if ($res); exit $res ? 1: 0; -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBo/72MJF5cimLx9ARAnxLAJ9dZWJ56pvO49Pf6JlUjzltegzmZwCfcyDk ZArJ2xKE6qz2EDaKqjDenG4= =CNr3 -END PGP SIGNATURE-
req: volunteers to run buildbot slaves
so we're setting up a distributed build-testing system, BuildBot (http://buildbot.sourceforge.net/), for now at http://bugzilla.spamassassin.org:8010/ (that url may change.) it currently has 4 build slaves, building - trunk using Red Hat 7.3's perl - trunk using vanilla perl 5.6.1 - trunk using vanilla perl 5.8.5 with threading - b3.0 using Red Hat 7.3's perl If you fancy it, and are running an OS different from the above (!), it might be worthwhile setting up a build slave to extend this... non-linux platforms especially would be great. Any platform where make test currently passes, or nearly does, would be preferred ;) Notes: - the slave process should be kept up and running as much as possible; it's got to be a persistent daemon. - I'd recommend running as non-root, and not as your own userid. if a miscreant managed to get hostile code into SVN trunk, it'd pretty quickly get run on your machine by this code. - it's not *too* CPU hungry -- but will kick off a compile and make test *every time* someone checks something into SpamAssassin svn! so if that puts you off, this isn't for you ;) so pretty much, overall, this requires that you have root on some box which has a 99%-uptime network connection to set a slave up. Process to set up a build slave: [install Twisted 1.3.0. can be omitted if you already have it, or just use sudo apt-get install twisted if you're on debian unstable.] [note that you also need python 2.2 or later installed.] wget http://twistedmatrix.com/downloads/Twisted-1.3.0.tar.bz2 bunzip2 -cd Twisted-1.3.0.tar.bz2 | tar xvf - cd Twisted-1.3.0 ; sudo python setup.py install cd .. wget http://internap.dl.sourceforge.net/sourceforge/buildbot/buildbot-0.6.1.tar.gz tar xvfz buildbot-0.6.1.tar.gz cd buildbot-0.6.1 ; sudo python setup.py install sudo useradd -c SpamAssassin Buildbot buildbot sudo su - buildbot mkdir -p /home/buildbot/slaves [now, you need the buildbot password. ask on the IRC channel and one of the PMC should be able to set you up with one.] PASSWORD=[password] [give your slave a good name, like debian-stable or ubuntu-hoary-perl585] HOST_OS=hostname-osname buildbot slave /home/buildbot/slaves/$HOST_OS bugzilla.spamassassin.org:9989 \ $HOST_OS $PASSWORD [and mail dev/at/SpamAssassin.apache.org the $HOST_OS string you've chosen.] [to start the slave process] buildbot start /home/buildbot/slaves/$HOST_OS [to monitor slave progress/errors:] less /home/buildbot/slaves/$HOST_OS/twistd.log [to start at boot in future: add this line to crontab:] @reboot buildbot start /home/buildbot/slaves/hostname-osname --j.
Re: svn commit: r106600 - /spamassassin/trunk/t/SATest.pm
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: [EMAIL PROTECTED] writes: What's the probability that I run into an already used port with the new probably_unused_spamd_port() code? Less than 1 per mill? Ask Murphy... The only chance of a collision is if the port is listed in /etc/services. My system only has 3 TCP ports above 32768 listed. So if my math is right, that's a 0.003% chance of a collision between two processes. The purely random code had a 0.1% chance of a collision between two processes (running at the same time which could happen), mostly because it only used 1000 ports. A 32768-port random version would have a 0.003% chance of a collision. The routine now tries to ask netstat if that port is already in use. I tested the pattern on Linux, FreeBSD and Windows. If netstat can't be run, no harm is done, the routine will just work as before. The grep is pretty broad, it might also catch a remote port; then it just tries the next random one. (Hey Murphy, it really can't hit a used port ten times, can it?) I'm not a big fan of shell calls, but it looks (untested) like it'll work on Windows XP too. wow guys -- overkill ;) I think both approaches are wrong. Firstly, checking services seems pointless, because if you ask me, there's actually a *low* likelihood that processes listening on high ports will be listed in /etc/services at all. Here's why: 1. I've heard of very few official services on ports 32768 in general. So I'd surmise that if one is running, the user who started it just picked a port at random. 2. typically a daemon running on a high port will be something that was started by a user instead of root, and users don't have write perms on /etc/services. Finally, mss' approach is wrong because it's too inefficient, requiring (another) command be forked every time a t script starts. easier, portable way to check if a port is in use: use Socket to connect() to it, and regenerate a new port if the connect succeeds. No fork overhead, no portability worries. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBp9mIMJF5cimLx9ARAn4YAJkB/aTNG9Gm/oGcV+53CVwQnWRiEACgtdkE c/A9EwOAKmpB+b+vmyscqgA= =MO4Z -END PGP SIGNATURE-
MIT spam conference
looks like it *is* indeed on this year -- http://www.spamconference.org/ CFP ends in 4 days though. --j.
Re: Restarting MakeMaker development (fwd)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony Finch writes: perl Build.PL install_base=~ This is different from PREFIX in that its not going to try and guess how you want things installed based on your system installation. It's just going to plop things into ~/bin, ~/lib, ~/man, etc... This is much saner and easier to predict than PREFIX. I install SpamAssassin in a non-standard location in order to permit multiple parallel installations. This sounds much closer to what I want - it's really painful to get MakeMaker to do the right thing. hmm, are you using the way documented in the INSTALL file? as far as I know that should work reliably -- perl Makefile.PL PREFIX=$HOME I *think* we got that working eventually. agreed, it was tricky due to EU:MM wierdness. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBrgY3MJF5cimLx9ARAsQLAKCUwQ4RQEs4h/BOyux7VBlRb6yvYwCeI1Gm YtvvtrcF3IPaj1ofRas355A= =f7/Y -END PGP SIGNATURE-
Re: Cleaning up the test framework
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Matt Sergeant writes: On 1 Dec 2004, at 15:13, Malte S. Stretz wrote: So I'd like to keyword some of the tests as basic (or whatever keyword) and only those tests are run per default. All other tests would be used by us devs, people who we ask to debug one of their bug reports aund the BuildBots. No more options, please. And there's no reason to speed it up for users because users only run make test once in a while. My idea was that per default all tests are run (except everything which requires further set-up or can fail easily like net tests or SQL). See prove (now shipped with core perl) and Test::Verbose's tv command (which is indispensable to any perl developer IMHO). Also consider adding some tests to CVS but not adding them to the MANIFEST, which will achieve what you require there. prove is good, and already works with our current t scripts too -- bonus! Re: adding some tests to CVS but not adding them to the MANIFEST: that will indeed result in some tests that aren't in the distro but are run from svn, but that doesn't address the situation where a test needs extra configuration data (such as LDAP schema to use, LDAP server, blah blah). Still, it's a good way to have SVN-only tests. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBr2maMJF5cimLx9ARAnpZAJwNso5hp9fhFfzxulrdO9YG4a/JqACgs3hQ J/sKng0FFtWQ2AGmJjfG2tw= =dZZD -END PGP SIGNATURE-
Re: svn commit: r109552 - /spamassassin/trunk/lib/Mail/SpamAssassin.pm /spamassassin/trunk/spamd/spamd.raw
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 as a matter of interest -- I guess this is for Daniel -- why is that debug-area-splitting-and-validation code not part of Mail::SpamAssassin, anyway? Looks like it's duplicated in spamd, spamassassin, and sa-learn, which will result in it getting changed in one and forgotten in the others (as has just happened here ;) Code duplication = bad. - --j. [EMAIL PROTECTED] writes: Author: mss Date: Thu Dec 2 13:36:57 2004 New Revision: 109552 URL: http://svn.apache.org/viewcvs?view=revrev=109552 Log: Made it possible to replace all the warn() kludges in spamd with dbg() or info() calls. Modified: spamassassin/trunk/lib/Mail/SpamAssassin.pm spamassassin/trunk/spamd/spamd.raw Modified: spamassassin/trunk/lib/Mail/SpamAssassin.pm Url: http://svn.apache.org/viewcvs/spamassassin/trunk/lib/Mail/SpamAssassin.pm?view=diffrev=109552p1=spamassassin/trunk/lib/Mail/SpamAssassin.pmr1=109551p2=spamassassin/trunk/lib/Mail/SpamAssassin.pmr2=109552 == --- spamassassin/trunk/lib/Mail/SpamAssassin.pm (original) +++ spamassassin/trunk/lib/Mail/SpamAssassin.pm Thu Dec 2 13:36:57 2004 @@ -243,16 +243,8 @@ if (!defined $self) { $self = { }; } bless ($self, $class); - # define debugging facilities first - $INFO = 0; - $DEBUG = 0; - if (defined $self-{debug} ref($self-{debug}) eq ARRAY) { -$facilities{$_} = 1 for @{ $self-{debug} }; -# turn on informational notices -$INFO = 1 if keys %facilities; -# turn on debugging if facilities other than info are enabled -$DEBUG = keys %facilities !(keys %facilities == 1 $facilities{info}); - } + # enable or disable debugging + Mail::SpamAssassin::_init_debugger(ref $self-{debug} eq 'ARRAY' ? @{ $self-{debug} } : ()); # first debugging information possibly printed should be the version info(generic: SpamAssassin version .Version()); @@ -280,6 +272,25 @@ $self; } + +# Do not use this routine in any 3rd-party scripts, it's not part of the +# official public API! spamd needs it though. +# +# Enables or disables debugging based on the facilities given. This will +# affect ALL SpamAssassin objects! +sub _init_debugger { + # define debugging facilities first + $INFO = 0; + $DEBUG = 0; + if (@_) { +$facilities{$_} = 1 for @_; +# turn on informational notices +$INFO = 1 if keys %facilities; +# turn on debugging if facilities other than info are enabled +$DEBUG = keys %facilities !(keys %facilities == 1 $facilities{info}); + } +} + sub create_locker { my ($self) = @_; Modified: spamassassin/trunk/spamd/spamd.raw Url: http://svn.apache.org/viewcvs/spamassassin/trunk/spamd/spamd.raw?view=diffrev=109552p1=spamassassin/trunk/spamd/spamd.rawr1=109551p2=spamassassin/trunk/spamd/spamd.rawr2=109552 == --- spamassassin/trunk/spamd/spamd.raw(original) +++ spamassassin/trunk/spamd/spamd.rawThu Dec 2 13:36:57 2004 @@ -217,7 +217,7 @@ 'auto-whitelist|whitelist|a' = sub { warn The -a option has been removed. Please look at the use_auto_whitelist config option instead.\n; exit 2; }, ) or print_usage_and_exit(); - + if ($opt{'help'}) { print_usage_and_exit(qq{For more details, use man spamd.\n}, 'EX_OK'); } @@ -226,6 +226,25 @@ exit($resphash{'EX_OK'}); } + +# Enable debugging, if any areas were specified. We do this already here, +# accessing some non-public API so we can use the convenient dbg() routine. +my @DEBUG; +if (defined $opt{'debug'}) { + if ($opt{'debug'}) { +@DEBUG = split(/,/, $opt{'debug'}); +if (grep { !/^\S+$/ } @DEBUG) { + warn bad areas in --debug option\n; +} + } + else { +@DEBUG = (all); + } +} +# Don't do this at home (aka any 3rd party tools), kids! +Mail::SpamAssassin::_init_debugger(@DEBUG); + + # bug 2228: make the values of (almost) all parameters which accept file paths # absolute, so they are still valid after daemonize() foreach my $opt ( @@ -728,19 +747,6 @@ Mail::SpamAssassin::Util::untaint_file_path( $opt{'pidfile'} ); } -# set debug areas, if any specified (only useful for command-line tools) -my @debug; -if (defined $opt{'debug'}) { - if ($opt{'debug'}) { -@debug = split(/,/, $opt{'debug'}); -if (grep { !/^\S+$/ } @debug) { - warn bad areas in --debug option\n; -} - } - else { -@debug = (all); - } -} my $spamtest = Mail::SpamAssassin-new( { @@ -748,7 +754,7 @@ rules_filename = ( $opt{'configpath'} || 0 ), site_rules_filename = ( $opt{'siteconfigpath'} || 0 ), local_tests_only = ( $opt{'local'} || 0 ), -debug= [EMAIL PROTECTED], +debug= [EMAIL PROTECTED],
Re: svn commit: r109710 - /spamassassin/branches/3.0/lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Theo Van Dinter writes: This came up on the list, I considered it trivial enough to just go ahead and commit it to the 3.0 branch. If there's an issue, let me know. On Fri, Dec 03, 2004 at 05:23:15PM -, [EMAIL PROTECTED] wrote: -next unless ($scanner-{conf}-is_rule_active('body_evals',$rulename)); +next unless ($scanner-{conf}-is_rule_active('body_evals',$rulename) || + $scanner-{conf}-is_rule_active('head_evals',$rulename)); meh, fine by me ;) one-liner. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBsKlaMJF5cimLx9ARAgOpAKCULdhKu/NIf5F45osEeIUEMIsjVQCfYA/v n7D/7BN7TPP6TfAtblpUcbM= =5KEh -END PGP SIGNATURE-
Re: Cron release@bugzilla $HOME/bin/extract_to_rsync_dir nightly /home/corpus-rsync/corpus/nightly-versions.txt $HOME/extract.log
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Theo Van Dinter writes: On Fri, Dec 03, 2004 at 12:58:24AM -0800, Cron Daemon wrote: svn: In directory 'nightly_mass_check/rules' svn: Can't copy 'nightly_mass_check/rules/.svn/tmp/text-base/25_razor2.cf.svn-base' to 'nightly_mass_check/rules/25_razor2.cf.tmp': No space left on device Oops! /dev/sda3 7701432 7306716 3500 100% / The current largest thing is Justin's home directory at 3GB: 3454580 jm oops! lots of old GA run data; mostly collated logs. nuked. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBsK36MJF5cimLx9ARAlTvAKCuvznccztgnJUOFPXAUPFwZMPlVQCgpRUY FQ2CsqvvDpg08XKkGYIP7ek= =HeBW -END PGP SIGNATURE-
Re: [Bug 4016] New: excessive use of fds
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 yes *please* ;) - --j. Tony Finch writes: I've been doing some DNS-intensive work with ADNS recently, and I was reminded how fast it is and how easy it is to run bulk jobs with over 10,000 concurrent DNS queries. You only need two sockets! Maybe I should beat Net::DNS to death with the clue bat. http://www.livejournal.com/users/fanf/ http://www.chiark.greenend.org.uk/~ian/adns/ Tony. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBsROJMJF5cimLx9ARAi7TAKC9zf8CcIrTf2ePYfmE3h/HTYqLggCgsANo ILD5nFNFjE7fhdDMhTMmMNk= =9ITT -END PGP SIGNATURE-
Re: [Fwd: Re: Addressing wiki vandalism (fwd)]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Upayavira writes: Sidney Markowitz wrote: I have created the account SidneyMarkowitz on wiki.apache.org/general. Please give it access to edit LocalBadContent, as described in the forwarded message below. I'm a SpamAssassin committer. Now, I don't know who you are, so I can't really add you myself comfortably (without, e.g. a CC to a PMC). However, anyone else who is already on the list can add you by editing wiki.apache.org/general/LeoSimons/AdminGroup. Once your name is on that page, you too will be able to add people. I guess this is a fair enough approach. Self organising community. Hmm. indeed, that works nicely ;) Sidney, you're added. anyone else, please CC pmc /at/ SpamAssassin.apache.org when requesting. hmm: in fact, it may be ok to just email PMC alone, since I/Daniel can do it now without infrastructure help. (Upayavira, does that make sense or would addressing to infra be better?) - --j. Regards, Upayavira Original Message Subject: Re: Addressing wiki vandalism (fwd) Date: Mon, 06 Dec 2004 12:41:42 -0800 From: [EMAIL PROTECTED] To: dev@spamassassin.apache.org FYI -- if you're a committer, please sign up to gain access to edit this page -- URLs listed in LocalBadContent will be blocked on our wiki. mail infrastructure /at/ apache.org with a request (once you've created a user account on wiki.apache.org/general.) --j. --- Forwarded Message Date:Mon, 06 Dec 2004 20:02:16 + From:Upayavira [EMAIL PROTECTED] To: Apache Infrastructure [EMAIL PROTECTED] Subject: Re: Addressing wiki vandalism Justin Mason wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Leo Simons writes: http://wiki.apache.org/general/LocalBadContent I've tried to set up some access control so that only users listed on http://wiki.apache.org/general/LeoSimons/AdminGroup can edit it (and only users on that page can edit that page). I'm not sure if that works. Could DavidCrossley or UpayaVira try and confirm they can edit, and someone else try and confirm they cannot? My new account there is DanielQuinlan. I'm afraid you'll need to get an account on wiki.apache.org/general as well so I can add you to that admingroup page. I've created an account there -- JustinMason. However, is it wise to restrict who can police the wiki-spam? I don't want to be the lone guy among all the SpamAssassin wiki editors who can block a spammer. It completely is necessary to restrict the page. Otherwise a spammer can remove his own site! As for a policy, I would have us add any committer who asks. I've added you two, anyway. Regards, Upayavira --- End of Forwarded Message -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD4DBQFBtOUpMJF5cimLx9ARAtisAJ9R3E90lALqzyGgxTU+/4EvPR0jgQCXXRnt V6Uu5hklmkTbalNaQ9u4EA== =Y8Sg -END PGP SIGNATURE-
Re: [SpamAssassin Wiki] Updated: CommercialNetworkAppliances
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [EMAIL PROTECTED] writes: Date: 2004-12-08T11:22:11 Editor: MrElvey [EMAIL PROTECTED] Wiki: SpamAssassin Wiki Page: CommercialNetworkAppliances URL: http://wiki.apache.org/spamassassin/CommercialNetworkAppliances Justin Mason told me he is an IronPort employee at the FTC Summit last month. I suspect that may have been Daniel Quinlan ;) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBt1cZMJF5cimLx9ARAsS3AJ9O7kSBDlARkSUOoKDbRxlzcMuMbgCgndZm dzVqAAlT2cIGqQad7ftugow= =oiHf -END PGP SIGNATURE-
Re: svn commit: r111767 - /spamassassin/trunk/rules/70_testing.cf
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [EMAIL PROTECTED] writes: Author: quinlan Date: Mon Dec 13 16:39:58 2004 New Revision: 111767 URL: http://svn.apache.org/viewcvs?view=revrev=111767 Log: remove Flex Hex rules due to low accuracy what were the results? - --j. Modified: spamassassin/trunk/rules/70_testing.cf Modified: spamassassin/trunk/rules/70_testing.cf Url: http://svn.apache.org/viewcvs/spamassassin/trunk/rules/70_testing.cf?view=diffrev=111767p1=spamassassin/trunk/rules/70_testing.cfr1=111766p2=spamassassin/trunk/rules/70_testing.cfr2=111767 == --- spamassassin/trunk/rules/70_testing.cf(original) +++ spamassassin/trunk/rules/70_testing.cfMon Dec 13 16:39:58 2004 @@ -444,10 +444,6 @@ -body T_HTML_COLOR_FLEX_HEX_1 eval:html_test('flex_hex1') -body T_HTML_COLOR_FLEX_HEX_2 eval:html_test('flex_hex2') -body T_HTML_COLOR_FLEX_HEX_3 eval:html_test('flex_hex3') - body T_HTML_TAG_EXIST_BGSOUNDeval:html_tag_exists('bgsound') body T_HTML_IMAGE_SIZE_ZERO eval:html_test('image_size_zero') -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBvjfoMJF5cimLx9ARAqmeAJoC1LsFiyy7oAgPh2cn5hzSIwKuPQCbBBqg GPRmVGi65Qmr255n9XfEjQc= =hWvP -END PGP SIGNATURE-
Storable and hyperthreading
OK, so on the spamd hang bugs, we have: - a set of people reporting hangs predominantly (all?) when running spamd on hyperthreaded CPUs - not all HT CPUs are acting up - a hang traced into Storable::dclone() (thanks Dallas!) so I think we may have run into a perl thread-safety bug, possibly in Storable, possibly at a lower level, and running on HT cpus causes this bug to manifest itself. Another reason to get rid of our use of Storable, in my opinion. --j.
Re: YOU ARE ON THE WAY TO DESTRUCTION
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Michael Parker writes: On Wed, Dec 15, 2004 at 04:25:29PM -0800, Daniel Quinlan wrote: Bugzilla says we can release 3.0.2 so I therefore propose we release 3.0.2. +1 for release, all tests pass on several of my machines. +1, if we're all clear, let's go for it; I'm not going to hold for 3828 in that case. (btw I get: Like they said at NASA - Better, faster, cheaper - you get to pick two. appropriate!) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBwNwaMJF5cimLx9ARAizdAJ4mS7zwqf2x977B0HZ1P+bM7uRkwwCgkGxl b7c17YUNX8XcaUovroTQT4U= =etJM -END PGP SIGNATURE-
Re: YOU ARE ON THE WAY TO DESTRUCTION
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Theo Van Dinter writes: It's moot at this point, and I meant to vote -0.5 and not -1 BTW. I wanted more of a I'd rather not yet versus NFW. It was more of a shock of nothing at all for weeks about a release, then suddenly, the one night I'm not sitting online, a bunch of stuff happens and a release occurs. I suspect something must have happened in IRC. I wasn't there either ;) Anyway, as for reasoning -- I have been having conversations with the Habeas folks to get this code/support into 3.0.2. Per my last message I've already explained how since there wasn't any discussion for weeks now wrt a release, there wasn't extreme urgency in doing a code review. Had I known there was going to be a release tonight/this week/this month, I would have made an effort to free up enough time to do the review beforehand. ouch. agreed, that's not too hot, but it's really more the fault of the 3.0.2-vs-Future slip-up... - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBwd0MMJF5cimLx9ARAof2AKCDONL0rxTpUp1PAz235m3+yrG8oQCcC2cT CUrT/jr91f0fXtjkqqC3Lw4= =ObJV -END PGP SIGNATURE-
Re: svn commit: r122529 - /spamassassin/trunk/lib/Mail/SpamAssassin/Reporter.pm
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [EMAIL PROTECTED] writes: Author: felicity Date: Wed Dec 15 22:25:05 2004 New Revision: 122529 URL: http://svn.apache.org/viewcvs?view=revrev=122529 Log: got a syntax error doing reporting. also, no point in doing regexp since we're looking for explicit strings, just use eq. what about the newline? - --j. Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Reporter.pm Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Reporter.pm Url: http://svn.apache.org/viewcvs/spamassassin/trunk/lib/Mail/SpamAssassin/Reporter.pm?view=diffrev=122529p1=spamassassin/trunk/lib/Mail/SpamAssassin/Reporter.pmr1=122528p2=spamassassin/trunk/lib/Mail/SpamAssassin/Reporter.pmr2=122529 == --- spamassassin/trunk/lib/Mail/SpamAssassin/Reporter.pm (original) +++ spamassassin/trunk/lib/Mail/SpamAssassin/Reporter.pm Wed Dec 15 22:25:05 2004 @@ -239,9 +239,9 @@ if ($err) { alarm $oldalarm; -if ($err =~ /^__alarm__$/) { +if ($err eq '__alarm__') { dbg(reporter: pyzor report timed out after $timeout seconds); -} elsif ($err /^__brokenpipe__$/) { +} elsif ($err eq '__brokenpipe__') { dbg(reporter: pyzor report failed: broken pipe); } else { warn(reporter: pyzor report failed: $err\n); -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBwds7MJF5cimLx9ARAii3AJ9P6ssd2Qbh47kImDy0Ns0w01wxpACeP374 DEDBV1jX/5zg4+qO3fgxCgI= =uN5K -END PGP SIGNATURE-
Re: buildbot failure in [...]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Malte S. Stretz writes: On Friday 17 December 2004 11:36 CET [EMAIL PROTECTED] wrote: The Buildbot has detected a new failure of trunk-debian-stable. Buildbot URL: http://bugzilla.spamassassin.org:8010/ Build Reason: changes Build Source Stamp: 112 Blamelist: quinlan BUILD FAILED: failed svn Those messages are getting a bit annoying, is there any way to filter any builtbot message which contains BUILD FAILED: failed svn on the server? no! (a) they really are failures. in this case the svn server seems to have died, which is good to know ;) The whole point of this is to get notification of failures. (b) however the -parker- and -sidney- ones *are* getting annoying. ;) I suggest we turn off those slaves until we can figure out how to get buildbot to work with dynamic-IP slaves... - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBwyH4MJF5cimLx9ARAuMbAJ9SSnez7MSgQtUsq9JlKFnP6t8EEACfYqZo Cnd/J6zOu6Gqe6h+HHXvKQE= =X3Rp -END PGP SIGNATURE-
Re: buildbot failure in [...]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sidney Markowitz writes: Justin Mason wrote: Sidney, have you tried setting --keepalive=300 I'll try that. What Michael says does make sense. I'm behind a NAT. Is there a way of setting a port that the slave listens on? I can configure my NAT to let the slaves be designated servers on some port if I can make it a fixed port and assign a different port number to each of them. I'm sure if it is possible I could find it by RTFM, but I have not had a lot of time to learn about buildbot and twistd. hmm -- I don't think the slaves *ever* listen on a port -- instead they open a conn _out_ to the master. By the way I have to call twistd directly instead of buildbot in order to get everything to work in Cygwin and Win32. They need the -n option in order to run, and in Win32 I have to give it the -r win32, which I would have expected to be automatic when running a win32 buildbot. Cygwin command: twistd -l - -n -f ../buildbot.tap Win32 command: twistd -l - -n -r win32 -f ..\buildbot.tap might be worth signing up to buildbot-devel (it's very low traffic) and mention that... - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBw0hJMJF5cimLx9ARAtKfAKCCBDuRXE15qvY/xtcCaH5j0IYdDgCdHKAq CXAnBD9iVkyT8uuiNhIKzDs= =1otV -END PGP SIGNATURE-
Re: buildbot failure in [...]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sidney Markowitz writes: Malte S. Stretz wrote: Does anybody know what exactly goes wrong? Maybe it could work if we use port forwarding or stunnel or something to route the traffic to the dynamic clients over some server with a static IP? Here's my last svn failed log. It was on the native machine, and I just discovered that the two slaves on the VMWare virtual machine have been not responding for a couple of days, so it cannot be a matter of simultaneous access on the same machine. I'm going to try to restart them. Could the svn server be sensitive to too many clients hitting the same repository at the same time? Perhaps it would help to introduce a delay between triggering one slave and the next, or if that is not possible adding a sleep of a random time on the slaves before the svn update. I doubt that's it. First off, the svn failed logs were the same on all slaves as of the last svn checkin -- see http://bugzilla.spamassassin.org:8010/trunk-red-hat-7.3/builds/89/svn/0 http://bugzilla.spamassassin.org:8010/reqd-modules-only-5.8.1/builds/76/svn/0 both are running on the buildbot master machine as well. that's just because the SVN server was borked. Secondly, I have 4 slaves (a) started simultaneously and (b) hitting the repo simultaneously, on the buildbot machine. And if you look at 15:31:38 on Thu Dec 16, you can see 7 slaves hitting svn simultaneously, and all passing. So that's not it. Basically we have: - buildbot master host, localhost, no NAT: 5 slaves, always pass - jm: 1 slave, static IP, no NAT: debian-stable, always passes - parker: 3 slaves, behind NAT: frequent failures - sidney: 3 slaves, NAT?: frequent failures I think it's the NAT that causes the issue, and therefore the keepalive idea is the best bet... BTW bear in mind that the slaves are never connected *to*. Instead, they operate by opening a TCP connection to the master at startup, and receiving commands pushed to them via that. if that TCP conn dies, they disappear, and retry connections very slowly, like once every 10 mins with exponential backoff. - --j. -- sidney The log: starting svn operation command '['svn', 'update', '--revision', '122631']' in dir /b/home/buildbot/slaves/sidney-fedora3/trunk-sidney-fedora3/build (timeout 1200 secs) svn: PROPFIND request failed on '/repos/asf/spamassassin/trunk' svn: PROPFIND of '/repos/asf/spamassassin/trunk': Could not read status line: connection was closed by server. (http://svn.apache.org) update failed, clobbering and trying again command '['rm', '-rf', '/b/home/buildbot/slaves/sidney-fedora3/trunk-sidney-fedora3/build']' in dir /b/home/buildbot/slaves/sidney-fedora3/trunk-sidney-fedora3 (timeout 1200 secs) now retrying VC operation command '['svn', 'checkout', '--revision', '122631', 'http://svn.apache.org/repos/asf/spamassassin/trunk', 'build']' in dir /b/home/buildbot/slaves/sidney-fedora3/trunk-sidney-fedora3 (timeout 1200 secs) svn: PROPFIND request failed on '/repos/asf/spamassassin/trunk' svn: PROPFIND of '/repos/asf/spamassassin/trunk': Could not read status line: connection was closed by server. (http://svn.apache.org) program finished with exit code 1 -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBw1XoMJF5cimLx9ARAmbsAJ0QFRYByCiQ4WY6K47wN/E7wxru0ACeOHNj JTOK7lD2BWBdKwyF7DPs0sM= =xqJz -END PGP SIGNATURE-
Re: buildbot failure in [...]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: [EMAIL PROTECTED] (Justin Mason) writes: Well, I think failed svn is something that all build failures produce. Even if the problem is a bug, rather than an svn timeout... I think we should remove (a) all of the NATed slaves and (b) any build server that can't reliably connect to the server. I'm already ignoring all failures, so the purpose of the build system has beenn completely lost. It's more important to have reliable build hosts than maintain the excessive build host diversity that we have right now. agreed, to be honest... Sidney, Michael, what do you think? - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBw2HuMJF5cimLx9ARAgvTAKCAR41dM/Ch8Ug0FG0acfWeHOpRHQCfZPQY RmmeWBs/GxxEnow3wJ6NhJo= =0BQo -END PGP SIGNATURE-
Re: RFC: New Plugin Hook
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 makes sense to me. I'd (a) expand the doco, and (b) use a better name than verify_user for the method, as it took a while for me to grok it. rather than verify_user, how's about service_acl_allows_username or similar? - --j. Michael Parker writes: Howdy, I was looking at a possible solution to: http://bugzilla.spamassassin.org/show_bug.cgi?id=3215 and decided that it could be done pretty easily if I had a new plugin hook. So, I created one, I wanted to get y'all opinion on it before I went forward. The new plugin is verify_user which takes a services hash and a username as input. The plugin is responsible for a) making sure it is supposed to handle one of the passed in services and b) that the username is allowed/whatever to use the service. Please see the enclosed diff and sample plugin that implements the feature. Obviously, this is a bayessql specific case, but I could see it being used in other areas of the code. You could have multiple plugins, that handled different authorization methods. Comments? Michael Here is the diff: Index: lib/Mail/SpamAssassin/BayesStore/SQL.pm === --- lib/Mail/SpamAssassin/BayesStore/SQL.pm (revision 122598) +++ lib/Mail/SpamAssassin/BayesStore/SQL.pm (working copy) @@ -140,7 +140,7 @@ } unless ($self-_initialize_db()) { -dbg(bayes: database entry for .$self-{_username}. not found); +dbg(bayes: unable to initialize database for .$self-{_username}. user, aborting!); $self-untie_db(); return 0; } @@ -1733,6 +1733,20 @@ return 0 if (!$self-{_username}); + # Check to see if we should call the verify_user plugin hook to see if this + # user is allowed/able to use bayes. If not, do nothing and return 0. + if ($self-{bayes}-{conf}-{bayes_sql_verify_user}) { +my $services = { 'bayessql' = 0 }; +$self-{bayes}-{main}-call_plugins(verify_user, { services = $services, + username = $self-{_username}, + }); + +unless ($services-{bayessql}) { + dbg(bayes: username not verified by verify_user plugin); + return 0; +} + } + my $sqlselect = SELECT id FROM bayes_vars WHERE username = ?; my $sthselect = $self-{_dbh}-prepare_cached($sqlselect); Index: lib/Mail/SpamAssassin/Plugin.pm === --- lib/Mail/SpamAssassin/Plugin.pm (revision 122598) +++ lib/Mail/SpamAssassin/Plugin.pm (working copy) @@ -219,6 +219,34 @@ =back +=item $plugin-verify_user ( { options ... } ) + +=over 4 + +=item services + +Reference to a hash containing the services you want to check. + +In order to verify a user, the plugin should first check that the +service it is handling exists in the hash and then set the value +of the service to a postive value if the username is verified/validated +for that service. + +The current supported services are: + +=over 4 + +=item bayessql + +=back + + +=item username + +A username + +=back + =item $plugin-check_start ( { options ... } ) Signals that a message check operation is starting. Index: lib/Mail/SpamAssassin/Conf.pm === --- lib/Mail/SpamAssassin/Conf.pm (revision 122598) +++ lib/Mail/SpamAssassin/Conf.pm (working copy) @@ -2719,6 +2719,28 @@ type = $CONF_TYPE_STRING }); +=item bayes_sql_verify_user (0 | 1) (default: 0) + +Whether to call the verify_user plugin hook in BayesSQL. If the hook +does not determine that the user is allowed to use bayes or is invalid +then then database will not be initialized. + +NOTE: By default the user is considered invalid until a plugin returns +a true value. If you enable this, but do not have a proper plugin +loaded, all users will turn up as invalid. + +The username passed into the plugin can be affected by the +bayes_sql_override_username config option. + +=cut + + push (@cmds, { +setting = 'bayes_sql_verify_user', +is_admin = 1, +default = 0, +type = $CONF_TYPE_BOOL + }); + =item user_scores_dsn DBI:databasetype:databasename:hostname:port If you load user scores from an SQL database, this will set the DSN Here is the sample plugin: package Mail::SpamAssassin::Plugin::VerifyUser; =pod This is a sample plugin, it may not work at all, so buyer beware. It also uses an experimental plugin hook, that may or may not be supported. The groupfile for this feature looks something like: bayessql: parker foobar1 foobar2 =cut use Mail::SpamAssassin::Plugin; use strict; use bytes; use Apache::Htgroup; use vars qw(@ISA); @ISA = qw(Mail::SpamAssassin::Plugin); use constant GROUPFILE =
Re: RFC: New Plugin Hook
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Michael Parker writes: On Fri, Dec 17, 2004 at 05:38:26PM -0800, Justin Mason wrote: ok -- service_allowed_for_username -- there's only one service for each call. ;) Why put that sort of restriction? what if I wanted something like: $services = { 'bayessql' = 0, 'awl' = 0, 'awlsql' = 0, 'allow_user_rules' = 0, 'etc' = 1 } I've implemented it as a single service in BayesSQL but there is no reason why you couldn't move the plugin call to a higher level and pass in ALL of the services you are interested in. ah, missed that. ok, makes sense. that should probably be called out specifically in the doco... - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBw40DMJF5cimLx9ARAg+FAJwMj5den+U4I/bZTNvAklNewwDaOwCeJM9t mMa/L9IbllFxsnP4Ykx3fKE= =/Tnl -END PGP SIGNATURE-
Re: RFC: New Plugin Hook
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Michael Parker writes: On Fri, Dec 17, 2004 at 04:41:51PM -0800, Justin Mason wrote: makes sense to me. I'd (a) expand the doco, and (b) use a better name than verify_user for the method, as it took a while for me to grok it. rather than verify_user, how's about service_acl_allows_username or similar? Opps, missed the whole what this thing does blurb in the POD. I'm horrible at naming things, how about services_allowed_for_username? ok -- service_allowed_for_username -- there's only one service for each call. ;) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBw4oSMJF5cimLx9ARAhMtAJwN4GSgjguoknqE5xN7N+pzh1CpUgCfR3od 9gPZvN1mY7cG9TnmawXKVWc= =QVqw -END PGP SIGNATURE-
Re: svn commit: r124477 - /spamassassin/trunk/lib/Mail/SpamAssassin/EvalTests.pm /spamassassin/trunk/rules/20_body_tests.cf /spamassassin/trunk/rules/70_testing.cf
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [EMAIL PROTECTED] writes: Author: quinlan Date: Fri Jan 7 00:06:07 2005 New Revision: 124477 URL: http://svn.apache.org/viewcvs?view=revrev4477 Log: promote T_BAD_ISO_CHARSET to MIME_BAD_ISO_CHARSET, but convert it to an eval function to avoid using a full test we should really figure out some way to expose those in-body MIME headers in a new rule type... - --j. Modified: spamassassin/trunk/lib/Mail/SpamAssassin/EvalTests.pm spamassassin/trunk/rules/20_body_tests.cf spamassassin/trunk/rules/70_testing.cf Modified: spamassassin/trunk/lib/Mail/SpamAssassin/EvalTests.pm Url: http://svn.apache.org/viewcvs/spamassassin/trunk/lib/Mail/SpamAssassin/EvalTests.pm?view=diffrev4477p1=spamassassin/trunk/lib/Mail/SpamAssassin/EvalTests.pmr14476p2=spamassassin/trunk/lib/Mail/SpamAssassin/EvalTests.pmr24477 =--- spamassassin/trunk/lib/Mail/SpamAssassin/EvalTests.pm(original) +++ spamassassin/trunk/lib/Mail/SpamAssassin/EvalTests.pm Fri Jan 7 00:06:07 2005 @@ -2353,6 +2353,12 @@ $self-{mime_base64_no_name} = 1; } + if ($charset =~ /iso-\S+-\S+\b/i + $charset !~ /iso-(?:8859-\d{1,2}|2022-(?:jp|kr))\b/) + { +$self-{mime_bad_iso_charset} = 1; + } + # MIME_BASE64_LATIN: now a zero-hitter # if (!$name # $cte =~ /base64/ @@ -2414,7 +2420,7 @@ || ($name eq xls $ctype !~ [EMAIL PROTECTED]/.*excel$@) ) { - $self-{mime_suspect_name} = 1; + $self-{mime_suspect_name} = 1; } } } Modified: spamassassin/trunk/rules/20_body_tests.cf Url: http://svn.apache.org/viewcvs/spamassassin/trunk/rules/20_body_tests.cf?view=diffrev4477p1=spamassassin/trunk/rules/20_body_tests.cfr14476p2=spamassassin/trunk/rules/20_body_tests.cfr24477 =--- spamassassin/trunk/rules/20_body_tests.cf(original) +++ spamassassin/trunk/rules/20_body_tests.cf Fri Jan 7 00:06:07 2005 @@ -123,6 +123,9 @@ body MPART_ALT_DIFF_COUNT eval:multipart_alternative_difference_count('3', '1') describe MPART_ALT_DIFF_COUNTHTML and text parts are different +body MIME_BAD_ISO_CHARSETeval:check_for_mime('mime_bad_iso_charset') +describe MIME_BAD_ISO_CHARSETMIME character set is an unknown ISO charset + ### body CHARSET_FARAWAY eval:check_for_faraway_charset() Modified: spamassassin/trunk/rules/70_testing.cf Url: http://svn.apache.org/viewcvs/spamassassin/trunk/rules/70_testing.cf?view=diffrev4477p1=spamassassin/trunk/rules/70_testing.cfr14476p2=spamassassin/trunk/rules/70_testing.cfr24477 =--- spamassassin/trunk/rules/70_testing.cf (original) +++ spamassassin/trunk/rules/70_testing.cfFri Jan 7 00:06:07 2005 @@ -354,11 +354,4 @@ -# bug 4054: contributions from Maxime Ritter (airmax.cf) - -# only works on full, may be better to check in Message object for this -full __ISO_VALID /charset=\?iso-(?:8859-\d{1,2}|2022-(?:jp|kr))\b/i -full __ISO_CHARSET /charset=\?iso-\S+-\S+\b/i -meta T_BAD_ISO_CHARSET (__ISO_CHARSET !__ISO_VALID) - body T_NORMAL_HTTP_TO_IP eval:check_numeric_http() -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFB3t6EMJF5cimLx9ARAh0CAJ9UL1xcUI/yBjRzgE63oAXdyflc8gCcD0NC FtfNG2YkwDEO6I7zMNzoygY= =01eO -END PGP SIGNATURE-
Re: svn commit: r124477 - /spamassassin/trunk/lib/Mail/SpamAssassin/EvalTests.pm /spamassassin/trunk/rules/20_body_tests.cf /spamassassin/trunk/rules/70_testing.cf
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: [EMAIL PROTECTED] (Justin Mason) writes: we should really figure out some way to expose those in-body MIME headers in a new rule type... I was thinking the same thing. oh good, so you've changed your mind since http://bugzilla.spamassassin.org/show_bug.cgi?id=3781#c3 then ;) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFB3vX1MJF5cimLx9ARAqTyAJ0ZndmkmF/cHzTpWZ3FESQKr/wydgCfZfpa zZ+TYtYtFoXTZW27fS2Rfms= =yNd+ -END PGP SIGNATURE-
Re: rules needing work
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: Rules with the largest RANK drops from 3.0 to now. The body and Subject: ones probably need the most work. The rest are probably lost causes. I suggest we drop them, if their RANK is sufficiently low and nobody steps up to fix them; it's about time some rules (at least body rules that is) finally got deleted from the default ruleset ;) I don't have strong feelings about any apart from ALL_TRUSTED. - --j. broken! -0.17 ALL_TRUSTED work needed: -0.32 UNIVERSITY_DIPLOMAS -0.26 STOCK_PICK -0.18 STOCK_ALERT -0.15 SUBJECT_DRUG_GAP_S -0.14 STRONG_BUY -0.14 DEEP_DISC_MEDS -0.13 DRUGS_PAIN -0.11 REVERSE_AGING -0.11 BANG_OPRAH -0.1 NO_CREDIT_CHECK -0.1 WE_HONOR_ALL not sure: -0.34 HTML_NONELEMENT_50_60 -0.23 HELO_DYNAMIC_OOL -0.21 HTML_BADTAG_20_30 -0.2 MIME_HTML_ONLY_MULTI -0.2 HTML_NONELEMENT_40_50 -0.19 HTML_FONT_SIZE_NONE -0.19 HTML_FONT_SIZE_TINY -0.19 HTML_BADTAG_30_40 -0.18 HTML_BADTAG_90_100 -0.16 HDR_ORDER_TRIMRS -0.16 X_ORIG_IP_NOT_IPV4 -0.11 HTML_NONELEMENT_90_100 -0.11 HELO_DYNAMIC_ATTBI -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB4gH+MJF5cimLx9ARAqnmAJ4k187nq9W0n0BJW5+rD5ig69FUDgCgjD/z s1PNnWjFZWB1Q8+oD2rnKUk= =3jnH -END PGP SIGNATURE-
Re: initial analysis of SPF_PASS results
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: First, large ISPs seem to be the origination point for a *lot* of spam. Large ISPs' outbound relays, or direct from their dynamic pools? e.g. blueyonder.co.uk list their dyn pools in their SPF record, which is unfortunate but legal. Second, here's my list of the domains we could potentially whitelist for SPF_PASS results (high count, good ratio, not biased towards open source folks). 0. 90 health.webmd.com 0. 27 foolsubs.com 0. 23 ms3.lga2.nytimes.com (list *.nytimes.com ?) 0. 17 match.com 0. 9 paypal.com +1 -- I can go for that. (Worth noting that I *don't* think we should also apply the converse, treating mails from those doms that don't fix the SPF record as forged; we'd need to do separate analysis on that.) For a different and even less biased approach, I took the listings with 0.01 or lower S/O ratio and ranked them by SenderBase volume (entries above 6.0 on the volume scale). Note that I just extracted registrar-level domain names from the SPF domain lists, so some of these are definitely not completely clean or are not immediately whitelistable. domain volume whitelist? -- -- ebay.com7.5 yeah amazon.com 6.7 yeah speakeasy.net 6.6 paypal.com 6.6 yeah msn.com 6.6 roving.com 6.5 nytimes.com 6.5 yeah m0.net 6.5 classmates.com 6.5 exacttarget.com 6.4 sparklist.com 6.2 sourceforge.net 6.1 securityfocus.com 6.1 spamarrest.com 6.0 rm04.net6.0 redhat.com 6.0 foolsubs.com6.0 yeah bluehornet.com 6.0 So, based on all that, I'm thinking we could experimentally add SPF_PASS whitelists for: ebay.com amazon.com paypal.com nytimes.com foolsubs.com webmd.com match.com I checked NANAE and the above domans seem to be pretty clean and this jives with my recollection. +1. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB4gLNMJF5cimLx9ARAn3CAKC7V80ycFkJrP+8bE3oP2T85VQ4NwCgi5t6 GdGMdM89ze4fvC/9l/uDdJ0= =jXd3 -END PGP SIGNATURE-
Re: Target Milestone of Future is harmful
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thomas Schulz writes: I would like to suggest that having a Target Milestone of Future for a bug is harmful. It was probably necessary when you were trying to get 3.0.0 out and you were not sure what the next verson number would be, but now it seems to be a way for a bug to fall into a black hole. It seems that if a bug is not grabbed by someone within a few hours of being submitted, it is lost. It's a manageability thing. We don't have someone who can sit there continually reprioritising bugs :( I suggest that if you have bugs with TM set to Future, and you think they're implementable sooner ;) -- feel free to post a comment and pipe up. In particular, getting a patch that implements the feature is a *lot* more likely to get a bug a solid milestone. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB4sycMJF5cimLx9ARAkTQAJ9rgwfZb2/vfyt9fjkNc5McdUdRCwCgifvP 0o8X6l0A6wBmqck+mU2Hh/E= =b1up -END PGP SIGNATURE-
Re: [Bug 4072] SPF_PASS false match
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: Do we want to fix this for 3.0.3, or just leaving it for 3.1? I'm okay with backporting, but I think we're nearing the point at which we buckle down on 3.1 and focus on it. The tree is remarkably stable right now and there are a number of significant improvements, so I'm starting to think about actually sticking to the aggressive schedule that was proposed a few months ago. :-) +1. I think we should only do a 3.0.3 if something serious (security, data loss, etc.) comes up. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB4v/QMJF5cimLx9ARAgkkAJ0SuiBX6eOw2mWHKXZw6K3FR603ugCdE+kW ekq0zXOQGaGEGuYaMZo7adk= =YWNa -END PGP SIGNATURE-
Re: IP_IN_RESERVED_RANGE = IP_PRIVATE
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: I'd like to make IP_IN_RESERVED_RANGE go away. In an ideal world, but I know Justin will object so I won't propose it, I would nuke it. Since it's possible some poor unsuspecting third-party plugin is using it in the same brokey was as our code was just yesterday, I propose we merely set it equal to the new IP_PRIVATE constant. If you read the comment: # Initialize a regexp for reserved IPs, i.e. ones that could be # used inside a company and be the first or second relay hit by # a message. Some companies use these internally and translate # them using a NAT firewall. These are listed in the RBL as invalid # originators -- which is true, if you receive the mail directly # from them; however we do not, so we should ignore them. That's how it's defined anyway -- an internal address. Does that sound okay? +1 -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB4yRIMJF5cimLx9ARAkCAAJ9vNTJlev+8pMy3GzxFlQP8lntCDQCfQxVN L9k1HDxGShMVwoitylFZ5jY= =xmiE -END PGP SIGNATURE-
Re: svn commit: r124477 - /spamassassin/trunk/lib/Mail/SpamAssassin/EvalTests.pm /spamassassin/trunk/rules/20_body_tests.cf /spamassassin/trunk/rules/70_testing.cf
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Loren Wilton writes: oh good, so you've changed your mind since http://bugzilla.spamassassin.org/show_bug.cgi?id781#c3 then ;) Somewhat. I still think it should be a plugin. There's a problem with plugins I hadn't realized when they were originally being advertized as the universal solution to oddball rules. The problem is that they aren't. Anyone can write a jive rule, if allow_user_rules is set. But nobody but the system administrator can install a plugin. And it seems that even invocations of a plugin aren't supposed to show up in the user_prefs file, even with allow_user_rules. So to be useful here (for the general case, which is what interests me) this would have to be a plugin that effectively exported a new rule base name, and the plugin would then take a general re against that base type. Which is the same as inventing a new rule base type, except that not as many people will be able to use it. I'm not sure what you mean here. could you add some examples? - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB4yycMJF5cimLx9ARAhEuAJ4klZ5AO0iSpMPZ2UtESkN26xX+iACgtyyG 6uGqWL2y8o0ozYhB5hnrjQE= =J/yK -END PGP SIGNATURE-
Re: BZ box being hammered
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Theo Van Dinter writes: I was noticing that the BZ box gets hammered, at the moment due to buildbot: If we're going to run buildbot on there, it should at least be done in serial and not parallel. not sure how easy that is :( we could add some hackery to the buildbot-slave configuration... I'm also not sure that it's the best box for a mass-check either, but they're having issues at the moment it looks like: ick. I need a new box ;) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB6wyIMJF5cimLx9ARApfEAJ9zPp6o1U2nZWLICmXM5dDgbcKavgCaAlaJ Oy12Tk0TVAb0kRfjgaMxZqQ= =/eCF -END PGP SIGNATURE-
Re: svn commit: r125369 - /spamassassin/trunk/rules/70_scraped.cf
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: Actually, can I suggest a different naming convention? T_bug number_test name For example, T_4081_FOO_BAR Short and easy to look up. Also, does your code handle naming for new/overlapping/existing predicates for meta rules? it's a hornet's nest ;) - - 1. T_ prefix will overlap with our own T_ prefix. I'd prefer a new prefix to keep them separate. T_MC_ might work, but I used MC_ here. some votes on what people would prefer are welcome ;) - - 2. there may be multiple revisions of rules and predicates with the same name inside one bug number, as in bug 2243, so it'd need scoping by both bug number and comment number; scoping by bug number alone will fail in this situation. - - 3. there may be new rules using *existing* rules as meta predicates, so it can't rename all rulenames found; just the ones where the rules are defined in that comment's ruleset. Current code does all those 3. However it does have a failing: - - It cannot deal with the case (as in bug 2243 comment 14) where a set of new rules are posted that use rules from a previous comment (cmt 13) as predicates. However I can't see a good way to deal with that case, without breaking the case where a new comment revises a ruleset from an earlier comment, without suffering rule name clashes. So I think that's an acceptable limitation. Finally, naming: Due to our absurd rule-name-length limitation policy ;), we cannot do the sensible thing, which would indeed be: MC_{bugnum}_{cmtnum}_{rulename} but current naming scheme is: MC_{rulename}_{rnd} where {rnd} is a 3-digit random number. (the idea is that hopefully the rulename will be short enough to scrape past the length limit, since --lint is used to ensure rules are valid before they're checked into 70_scraped.cf.) My current plans are to fix this by removing that stupid limit on rulename and description lengths. They've caused WAY, *WAAAY* more problems than they solved and I'm sick to death of them! :(Some sensible wrapping code would be simpler, and save EVERYONE a lot of trouble. Once I do that I'll fix automc to use the sensible naming scheme. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB6w7mMJF5cimLx9ARAvGyAKCa1YzsmAiNZUvYWpz37TxRpaxuFQCghjok jsnMuww1KTH67V5PWOlXF3c= =8MAI -END PGP SIGNATURE-
another bz error
on committing comment 11 to bug 4058: Internal Error Bugzilla has suffered an internal error. Please save this page and send it to dev@spamassassin.apache.org with details of what you were doing at the time this message appeared. URL: http://bugzilla.spamassassin.org/process_bug.cgi undef error - Can't find param named messageid at Bugzilla/Config.pm line 150. --j.
Re: svn commit: r125477 - /spamassassin/trunk/rules/70_scraped.cf
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: Justin, can you change a few things in addition to the names? - check-in via some role account asking infrastructure about this... - don't rename the tests every time - this is maddening ok, let's see what I can do there. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB7cdKMJF5cimLx9ARAoUaAKClwGO5Kjp9VWebu5z9wm+xtpJfyQCfXT8O CI9yFfaaKm1TG335tTM/UkY= =vLHh -END PGP SIGNATURE-
Re: real-time network results
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: Theo Van Dinter [EMAIL PROTECTED] writes: Yeah, that mostly sums up my feelings. The current RBL information tells us when a positive lookup occurs, but not when a negative lookup occurs. True, you have to assume a negative lookup if it doesn't show and the reuse mapping indicates it was present. I'll provide a way for people to disable reuse for rules that they normally don't run with. something in the mass-check user_prefs file, maybe. Note that even if some of those non-hits are due to downtime or timeouts or whatever, those *should* be considered as the realtime result since they affect accuracy. yes, that's very true. I'd really like to have RBL record all queries made and the results thereof, then all the issues above go away -- name changes and logic changes just look at the cached result, rule additions w/out cached result cause lookups at run-time as they are now. Maybe, but that is still off in the future. Huge delay to get that throughout all mail. We get 97% with names and 99% with names and dates. a case of the best being the enemy of the good, I think. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB7gLOMJF5cimLx9ARArgpAJ9tIaSUzsmPSj0TTno1Q2y+25uvvgCeO2tE VYbl8oz9+dSwE2ysI8ulgDw= =cQZO -END PGP SIGNATURE-
results from some DK testing
So here's a quick look at some DomainKeys rule freqs, from a quick mass-check of the last ~10k ham and ~10k spam in my corpus (mass-check --tail -j=8 --net --rules '^DK'): OVERALL% SPAM% HAM% S/ORANK SCORE NAME 19991 9998 99930.500 0.000.00 (all messages) 100.000 50.0125 49.98750.500 0.000.00 (all messages as %) 5.783 0.0500 11.51810.004 1.000.00 DK_SIGNED 0.375 0.0100 0.74050.013 0.33 -0.10 DK_VERIFIED 0.000 0. 0.0.500 0.330.00 DK_POLICY_SIGNALL 5.613 6.8714 4.35300.612 0.000.00 DK_POLICY_SIGNSOME 4.972 6.3013 3.64250.634 0.000.00 DK_POLICY_TESTING Some notes: - DK_SIGNED means the message had a DK signature. DK_VERIFIED means that it passed. most of the failures are due to the various crud added to all messages in my corpus, such as: - SpamAssassin markup. we have a bug open to move this to the start of the headers, instead of the end, which will fix this. However we may have to hack a way to ignore those hdrs in the DK plugin, in existing corpora, otherwise mass-check figures will be really crappy (as above). - other crud added: 'Status', 'X-UID', 'X-Keywords' (all added by my IMAP server), and 'X-MH-Thread-Markup' (added by my mhthread script). Problem is, most DK records (and the recommended style of signature in the draft iirc), is to sign everything *below* the signature point, on the assumption that further transitions from the sender to the receiver will only every *prepend* headers to the existing set, and that the verification will take place inside the recipient's external-MX MTA. My mail has already been through a variety of MTAs and both ends of an MDA. FWIW, GMail's DK record takes a more IIM-ish approach of signing a specific set of important headers like From, Subject, To et al., so virtually all of the DK_VERIFIED hits are from GMail. - so far DK_SIGNED's a great ham sign on its own (not that I'm suggesting we should use that, of course). the 4 spam mails look like they'd pass verification -- they're 419 spams sent by hand through yahoo and gmail's webmail interfaces. (yes, they do these by hand.) - obviously, a rule for DK verification failed, ie. (DK_SIGNED !DK_VERIFIED) would make a lousy anti-spam rule -- it's hitting almost all ham here. that may clear up a bit if we can figure out a way to deal with the headers appended in passage issue, but possibly not a whole lot, given the fact that DK sigs are broken by mailing lists appending footers to the body etc. - in terms of rules, (DK_SIGNED DK_VERIFIED DOMAIN_IN_SOME_WHITELIST_OR_ANOTHER) seems like the most likely aim. but we'll need to figure out how to fix those header-manglings to get the hitrate anywhere useful. (0.74% isn't really worth a DNS lookup.) - the DK_POLICY ones are to get an idea of what people are publishing in their DK records. looks like nobody's yet saying we sign all outbound mail ;) - speeds of scans using just the DK rules, in spam: 4693 0 3722 1 728 2 472 3 190 4 110 5 56 6 9 7 8 8 10 9 and in ham: 6382 0 2338 1 799 2 349 3 107 4 11 5 7 6 (generated with perl -pe 's/^.*scantime=//; s/,.*$//;' ham.log | sort |uniq -c) so it's reasonably fast. (a single DNS lookup takes place on every message.) --j.
Re: removing the rule-name-length limit (was Re: svn commit: r125722)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: [EMAIL PROTECTED] (Justin Mason) writes: ah, I didn't post examples of what the new formatting looks like -- here it is in report_safe 1: The example I posted *was* report_safe 1, so the new formatting does not look like that at all. hmm. I didn't spot your example -- mistook it for a sig! I need to figure out why yours looks different... let me take a look. Now, I can't find any agreement in bugzilla that those limits should have been imposed. ;) Nice try, but today's code changes are the things that get votes, not code changes from 2 years ago... sure. just pointing it out. our current translations have the following description lengths: German: under 50 chars: 201 too long: 380: 65% too long French: 236 158: 40% too long Dutch: 476 113: 19% too long Polish: 275 107: 28% too long in other words *none* of our translations yet implement the 50 character limit (bug 4007, bug 4040). In bug 4040, Klaus notes that he doubts it's *possible* to bring German descriptions under 50 characters anyway. I'm more willing to discuss increases to description lengths (given the expansion factor of many languages over English) than rule name lengths as I think carrying over to two lines does not render reports that unreadable. However, I think increasing the rule name length from 22 is too much. If I look at all of the rule name lengths from the custom rule sets (including a French one) on the Wiki, 9992 rules have a length of 22 or lower and only 16 rules have a length of 23 or 24 (none are longer than 24). 1. allows German-language 70-character descriptions ;) German typically requires approximately 25-35% the length of English, so changing the limit to 65 or 70 characters would be fine with me. 2. allows long enough rule names to support the additional 13 characters that should be added to each rule name in automc (bug ID, comment number, T_MC_ prefix, and underscores between them, ie. T_MC_rulename_2243_13). right now, we avoid this more-or-less by just adding 7 chars, MC_rulename_9Ac. But still, make test and buildbot will fail, if a bug with a rule name of longer than 15 characters in it is mass-checked. Just ignore the limits for T_ rules. That's fine with me. OK -- I'm happy to go for: - relax the description limit to allow 2-line descs - keep 22-char limit on rulenames - except for T_-prefix names That still leaves the problem of a few unreadable rule names, like FROM_WEBMAIL_END_NUMS6, though. I'd like to relax the rulename limit a *little* -- 28 chars maybe? - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB8BgIMJF5cimLx9ARAi8HAJ4vskDkIx1y9WZuGJ8SMOEKrW4g+QCeNie3 tyZbRKefx3bfUAicE79rAeg= =kgBw -END PGP SIGNATURE-
Re: svn commit: r125877 - /spamassassin/trunk/t/desc_wrap.t
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Malte S. Stretz writes: On Friday 21 January 2005 04:29 CET [EMAIL PROTECTED] wrote: Author: jm Date: Thu Jan 20 19:29:20 2005 New Revision: 125877 URL: http://svn.apache.org/viewcvs?view=revrev5877 Log: fix desc_wrap.t to deal with different Text::Wrap behaviour on older Maybe we should just require the newer version of Text::Wrap? Or implement our own wrapping algorithm as Daniel suggested though I prefer to reuse the existing module. no need -- it's now fixed anyway, so no longer an issue. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB8VB8MJF5cimLx9ARAvriAKCjSuxGNqNiBjo11oGW4o0ydoKNLQCfcQ+i bpQbn3VSDhiKU1plGIdLpmY= =q3yo -END PGP SIGNATURE-
Re: making spamassassin a meta document
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I'm happy with the sa-check idea, as long as we keep a spamassassin wrapper that just does an exec(). easy enough, and very sensible. +1 I think the POD docs from spamassassin should be split into the sa-check POD and whatever other sa-blah scripts we come up with from that. also, +1 on Michael's sa-history script idea. however the cvs/svn style, I'm not fond of. reasons: - without require-ish hacking, it'll mean all the commands would get use'd -- increasing RAM usage. I'd prefer to avoid that. - having multiple commands as prefix-command, e.g. sa-learn, sa-check etc. is good as a UNIX UI -- sa-tab to get the list of possible commands. - the POD file for that one wrapper would be gigantic and unusable. we could go for an svn-style spamassassin help, but then we'd have to write our own documentation-reading subsystem, which seems like wasted effort when POD is already there and already working nicely on all platforms. also, TBH I find that kind of subsystem to be an annoying UI -- do I read the man page? do I type blah help? blah help commands? etc. - --j. Malte S. Stretz writes: On Sunday 23 January 2005 00:22 CET Daniel Quinlan wrote: I've been thinking about bug 3635. One idea: rename spamassassin to sa-check make spamassassin a meta document that execs sa-check for backwards compatibility Another idea: make spamassassin a meta document that execs sa-check for backwards compatibility move spamassassin pod to spamassassinrun document Yet another idea: make spamassassin a caller for all tools, a bit like the cvs commands. Like this: old | new | calls ---+--+ spamassassin | spamassassin check | sa-check sa-learn | spamassassin learn | sa-learn spamassassin -r | spamassassin report | sa-report spamassassin -d | spamassassin clean | ... All sub-commands could be moved to /usr/lib/spamassassin (and out of $PATH when some compatibility flag is disabled) at some point. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB9dKqMJF5cimLx9ARAnFrAJ4jZADFAFpatVb3Qv43wzPxIdrIiACfUctg +oUJccV9ZM55PI5MhRJUHfI= =Yw+C -END PGP SIGNATURE-
new DK results
from a mass-check run I did last night. these are more promising; 12% of ham whitelistable: 19992 99930.500 0.000.00 (all messages) 100.000 50.0150 49.98500.500 0.000.00 (all messages as %) 6.338 0.0500 12.62880.004 1.000.00 DK_SIGNED 0.005 0. 0.01000.000 0.530.00 DK_POLICY_SIGNALL 0.485 0.0400 0.93070.041 0.47 -0.00 DK_VERIFIED 4.627 5.5506 3.70260.600 0.060.00 DK_POLICY_TESTING 5.162 6.1206 4.20290.593 0.000.00 DK_POLICY_SIGNSOME this was achieved by adding code which strips off known appended headers from the message, such as X-Spam-*, Status, IMAPBase etc. Records that passed verification were: 954 gmail.com 270 yahoo.com 10 crynwr.com 9 earthlink.net 6 space.net 5 yahoo-inc.com 5 omniti.com 1 sendmail.com 1 altn.com and that's it. AFAICS, most of those domains have only one selector, so that's a puny 9 DNS lookups? looking quite promising. ;) --j.
Re: [SURBL-Discuss] Re: Revisiting high-level 3.1 goals
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: Raymond Dijkxhoorn [EMAIL PROTECTED] writes: Please let us know what we should do, cutting out we should announce, the actual removal is just altering one export script... Considering that SA hasn't shipped with JP yet and that those hosts are already caught in WS (which predates JP), I'd announce that you're making the change in a week and then make the change. btw, I think requiring people to upgrade ASAP isn't necessarily a great idea; we can avoid it by setting up a new BL for WS minus JP. then 3.1.0 can look up - JP - WS_minus_JP and existing clients can look up - WS (which includes JP as before) and upgrade at their leisure... - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB/YWAMJF5cimLx9ARAsCpAJ9djZfXpjb5bnvqwVpB/DhWBj2ZJwCfSWUw +04XkceKOdaxgxXAG6wXgLQ= =QBtY -END PGP SIGNATURE-
Re: Revisiting high-level 3.1 goals
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Robert Menschel writes: Hello Daniel, Saturday, January 29, 2005, 9:46:05 PM, you wrote: - higher accuracy: lower FPs and lower FNs (rules, rules, rules... this also includes some notion of speeding up the mass-check process) DQ I've been banging away on this. We're closer to fixing the autolearn DQ thing and Henry has expressed some interest in coordinating a test of DQ perfect (train on everything) and perfect-sample (train on sample) DQ learning. DQ bin-doph's ReplaceTags plugin will also really help with rule writing, I DQ think, so I hope we get that into the tree soon. DQ I also now have a working prototype of network-test reuse code and boy DQ does it speed up network mass-checks. Look forward to all of those. I'm also trying to develop a mass-check installation/setup script of my own, based on what you were able to give me last year, which will enable people to simply run a script and build a mass-check system. It will enable people to do their own mass-checks the way we do in SARE, and it will also enable them to participate in the primary nightly mass-check run. My install/setup is still very rough, and has a long way to go, so I don't want to try to put a time table on it, but I have hopes it will be a help to people. I'd really like to get mass-check a *lot* more usable -- not sure exactly what would be involved, though. :( That was the aim of Duncan's patch in bz, but unfortunately we didn't get that into 3.0.0 and I think it's a little unlikely to be quite usable by now. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB/dO4MJF5cimLx9ARAqArAKCC/+r9BEVaPIE2tnD/J2/VJa5Y6ACgs2l1 GCYw9PHE0+TzPZlaE5STiyI= =asrp -END PGP SIGNATURE-
Re: svn commit: r149224 - in spamassassin/trunk: lib/Mail/SpamAssassin/PerMsgStatus.pm lib/Mail/SpamAssassin/Plugin.pm lib/Mail/SpamAssassin/Plugin/DefaultAutoLearnDiscriminator.pm rules/10_misc.cf rules/init.pre
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Michael Parker writes: Few things: 1) I thought plugin callss couldn't return values? actually -- they can. (I think in the config file case, it was the type of the return value changing frequently that was problematic). 2) I like the Plugin API, but why not keep the default in the code and allow added plugins to override? Doesn't need it's own default plugin. well, effectively doing this as a default plugin *does* this, without adding extra code for plugins to indicate do not run the default code, which is why I did it this way. (however perhaps it doesn't need to be in the Mail::SpamAssassin::Plugin hierarchy, it could be named something else.) 3) MANIFEST, assuming the plugin stays. oops! Anyway, I'm fine with changing the details if necessary. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB/dTgMJF5cimLx9ARAiM4AJ4u1d3AMSk/n7KpUp8TQndqxH45vgCfW5b8 6HXUlmWht3C71JZo89cjfc0= =+FuG -END PGP SIGNATURE-
Re: svn commit: r149224 - in spamassassin/trunk: lib/Mail/SpamAssassin/PerMsgStatus.pm lib/Mail/SpamAssassin/Plugin.pm lib/Mail/SpamAssassin/Plugin/DefaultAutoLearnDiscriminator.pm rules/10_misc.cf rules/init.pre
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Michael Parker writes: On Sun, Jan 30, 2005 at 10:49:04PM -0800, Justin Mason wrote: Michael Parker writes: Few things: 1) I thought plugin callss couldn't return values? actually -- they can. (I think in the config file case, it was the type of the return value changing frequently that was problematic). So, in the case where more than one plugin handles a call, which value is returned? last one run wins? if any return a defined value, that is used. actually, it's ||= -- so in this case it's a little more complex because one return value supported is 0 as well as undef. hmm. it may be better to have the last plugin get the return value. 2) I like the Plugin API, but why not keep the default in the code and allow added plugins to override? Doesn't need it's own default plugin. well, effectively doing this as a default plugin *does* this, without adding extra code for plugins to indicate do not run the default code, which is why I did it this way. (however perhaps it doesn't need to be in the Mail::SpamAssassin::Plugin hierarchy, it could be named something else.) Something like: $foo = call_plugin if !defined($foo) the default code goes here to set $foo There'd have to be an additional boolean indicating a plugin handled this, rather than just returning undef -- since the API is tri-state (undef/0/1 as return values). If someone disabled the plugin in init.pre and did not install their own plugin, what would happen? no autolearning occurs, everything else works as expected. ;) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB/p7WMJF5cimLx9ARAkQkAJ9N5PvCCyg6mcsWq+e/L/fH9twkrgCcCuW3 CAhzpyQod68yMx2qss8S6N4= =Fph7 -END PGP SIGNATURE-
Re: svn commit: r149224 - in spamassassin/trunk: lib/Mail/SpamAssassin/PerMsgStatus.pm lib/Mail/SpamAssassin/Plugin.pm lib/Mail/SpamAssassin/Plugin/DefaultAutoLearnDiscriminator.pm rules/10_misc.cf rules/init.pre
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Michael Parker writes: This is a MIME-formatted message. If you see this text it means that your E-mail software does not support MIME-formatted messages. --=_mail-11561-1107210743-0001-2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jan 31, 2005 at 05:25:25PM -0500, Theo Van Dinter wrote: On Mon, Jan 31, 2005 at 04:16:01PM -0600, Michael Parker wrote: Which might be ok, but I can promise you that someone is going to go through and either rm init.pre or comment out every loadplugin line and then start asking questions about why their system isn't autolearning. Yeah, but they'll do the exact same thing with SURBL, Razor, etc. I'm not so worried about those, those are all pretty much self contained, so if they get shutoff no harm done. It's turning off pieces of a system in the core that bothers me. ok, you've convinced me... feel free to refactor that back into core, I think. It seems this *is* a little more core than Razor, Pyzor et al. (probably easiest to just rename the module back into the Mail::SpamAssassin::* namespace, then add the if defined() glue after the call_plugins call, rather than pushing the subs back into PerMsgStatus entirely.) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB/rcuMJF5cimLx9ARAnQVAJ9yVRNgSbnD7ZKzHNkteQeUhO48hwCghg9B XraXcolAC9K13q7RIVTkq5E= =vhgv -END PGP SIGNATURE-
Re: optional vs. standard plugins
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: Michael Parker wrote: Which might be ok, but I can promise you that someone is going to go through and either rm init.pre or comment out every loadplugin line and then start asking questions about why their system isn't autolearning. Losing autolearning if someone deletes init.pre is completely acceptable. Autolearning *should* be optional and pluggable. Making it pluggable allows people to experiment and try out other autolearning mechanism and I suspect we'll see some usage of the API soon. ;-) We could also add a new autolearn state like notloaded. That may indeed be a good idea, since this is now a new way for people to screw up their configs ;) Theo Van Dinter [EMAIL PROTECTED] writes: I think we should shoot for a goal of when all plugins are disabled the system should still do the right thing. If that means that we at least provide a default inline that can be overridden by a plugin, then that is how we should do it. Not autolearning if it has been disabled *is* the right thing. Things work fine if autolearning is off. Also, our current autolearning code does not improve results by that much in practice (which is why it needs to be revisited and other ways to autolearn to be explored). See Gordon C.'s paper for those results. I'll provide a slightly different version: for code that people are likely not to override (such as autolearning), we should probably just have it be in the code by default and let plugins override as necessary.. I disagree in this case, although I think there are probably some cases where things are likely to not be overridden. Users are going to encounter plugins and they're now a major part of basic SpamAssassin functionality (much like Apache httpd, incidentally) not a coincidence... , we should just document things well enough. If people comment out stuff without thinking, then there's not too much we can do about it. That's true. init.pre is exactly analogous to httpd.conf; an Apache install can be rendered thoroughly useless by turning off the wrong plugins. For plugins that are likely to not be overridden, I'd be fine with splitting init.pre into two or more files, like: standard.pre optional.pre experimental.pre or whatever. That would go a long way to guiding people as to how seriously they need to think before commenting stuff out. And, FWIW, I think I wrote the pre code to load all files that end in .pre, so this should work if we want that. Of course, I agree ** 100% ** that everything should work as in not fail if all plugins are commented out. There might be a few cases where plugins have cross-dependencies, but we should make sure our code deals with those and acts appropriately (warn, die, dbg, or whatever, but *no* straight Perl interpreter errors!). Also, putting a line next to the AutoLearnThreshold load line such as: # at least one AutoLearn plugin needs to be loaded for autolearning to work is more than enough to prevent a stupid commenting out. If people just comment stuff out without thinking or delete init.pre, we can't save them. OK, I agree with everything in this message ;) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB/r9DMJF5cimLx9ARAvKkAJwJ9pXdNHpGBdanCZsRwsRzWZN9sQCggoQn DcAYHloban14xSGPq2dXvaU= =25W7 -END PGP SIGNATURE-
Re: optional vs. standard plugins
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Michael raised a good point on IRC: Herk Theo makes a good point on the existing init.pre file, upgrades aren't going to get the new loadplugin lines added to their init.pre jmason well, it's exactly analogous to Apache's httpd.conf file Herk true, but can you disable some small piece of core functionality by not updating your conf file? I'm having trouble thinking of a concrete example in other types of servers Herk that match this case Herk FYI, I'm not -1 on the plugin, just stating an opinion that I don't believe having a plugin for the default case is needed/wise jmason hmm. you know, that's a point alright, this may cause upgrade issues. I hadn't considered that jmason specifically that 3.0.0 already has an init.pre, and we haven't currently got code that'll overwrite one of those so in other words, if users upgrade 3.0.x 3.1.0, unless we add code to our installer to deal with this, it'll mean they'll have to manually edit init.pre to add loadplugin lines for the new default discriminator. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB/sSHMJF5cimLx9ARAr5DAJ48oLP+8lqWWdaNl30ThILCZN2FOACgnxmn k2G76A+i5iZp3Ez21Po1z2E= =eObC -END PGP SIGNATURE-
Re: svn commit: r151753 - spamassassin/trunk/lib/Mail/SpamAssassin/Plugin.pm
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [EMAIL PROTECTED] writes: +Note: there are no guarantees that the internal data structures of +SpamAssassin will not change from release to release. In particular to +this plugin hook, if you modify the rules data structures in a +third-party plugin, all bets are off until such time that an API is +present for modifying that configuration data. ... that makes the new plugin API sound quite a bit less useful ;) what's it being added for? - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFCB+ApMJF5cimLx9ARAi4UAJoCiPFUCnirR+kOSXdqQbfZubEwrgCgnn+H Sp6mR0k4rM/hIm8DEIToeVw= =5wek -END PGP SIGNATURE-
Re: svn commit: r151753 - spamassassin/trunk/lib/Mail/SpamAssassin/Plugin.pm
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: [EMAIL PROTECTED] (Justin Mason) writes: ... that makes the new plugin API sound quite a bit less useful ;) what's it being added for? ReplaceTags. I hope to eventually clean up the internals and make it okay after 3.1, but it's just a bit too hairy right now to feel okay about making the API usable for random third-party plugins (it's fine as long as the integrator checks compatibility). yeah, I was hoping for the pass rule code for rules with certain tflags into the plugin for substitution approach as I mentioned before -- that doesn't require the plugin to delve into the Conf structure to do it. The API is still generally useful if you have *anything* to do at end of parsing (for example, tying a DB as in accessdb), stuff that doesn't involve internal APIs. ah, that's a good point. that hadn't occurred to me... - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFCB/i5MJF5cimLx9ARAjZFAKC1PlK/1jLhtdESD4TQq8QL3TnKtQCfSubV gA6QHfieBiwF7SYuUoNuu+k= =PWDL -END PGP SIGNATURE-
Re: RFC: Plan for faster updates
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Theo Van Dinter writes: Ok, here are my thoughts about how to do faster updates. ie: how to release rules + scores faster, potentially multiple times a day. I currently only think rules + scores ought to be released this way -- people aren't going to be comfortable with automated code updates IMO. Code/plugins are best left to full releases. (plugin support could be easily added later on, btw.) Pseudo-code is below, but here's some background details: Updates occur from channels. The default channel is updates.spamassassin.org, but the user can specify any number of channels on the commandline to use additionally. These can either be provided by us (think of updates being stable vs expirimental vs ...), or some third party (as long as they provide the same infrastructure...) cool. Updates have version numbers. The value format of which is irrelevent, as long as its monotonically increasing. For our updates I was thinking SVN revision, but could also do MMDDVV ala DNS SOA, etc. Versions are tracked per channel and SpamAssassin version. To check for updates, do a DNS TXT query ala z.y.x.updates.spamassassin.org, where z.y.x refers to the version of SpamAssassin being used, aka: x.y.z for 3.0.2, etc. For simplicitly, wildcards can be used on the DNS server to match a whole set of releases. An example: *.0.3.updates.spamassassin.org TXT 154203 *.1.3.updates.spamassassin.org TXT 158203 I haven't decided if that needs to be more machine parsable for future expansion. ie: v=1 ver=154023 I can't think of anything off hand that would need to go in there so just a version number is probably ok. For the initial request, mirrors.channel is a TXT record with an URL for the MIRRORED.BY (ie: http://spamassassin.apache.org/updates/MIRRORED.BY), which contains a list of parent URLs, and an optional list of options per mirror. ie: http://spamassassin.apache.org/updates weight=20 http://spamassassin.kluge.net/updates http://somemirror.example.com/spamassassin/updates weight=4 Means there are 3 mirrors, weighted so the apache.org one will be used the most (80% of the time), followed by the example.com one (16% of the time), followed by the kluge.net one (4% of the time). Weights are default '1', btw. The directory that is to be mirrored out appropriately looks like: dir/ MIRRORED.BY version.ext version.ext.sha1 ... versionn.ext versionn.ext.sha1 with version.ext.gpg .. versionnn.ext.gpg available optionally. I don't think GPG needs to be required, but for the paranoid amongst us, it needs to be available as an option. At the end, the script outputs a number of channel.cf files, which by default will just be read by SpamAssassin at startup (leaving restarting spamd up to the admin outside the script, based on exit code...) If a different directory is used, admin can simply include the channel.cf file in their local.cf. There are a few things I haven't fully fleshed out yet: 1) How to archive the update files together? I envisioned a similar naming convention to our normal rules directory (ie: a bunch of files named ##_type.cf), but the script should just expect to download a single file which will then be expanded. I don't want to rely on system calls to run an expansion, nor do I want to expect tar or zip to be installed, etc. 2) How to validate with GPG? Similar to the archive issue. Perhaps using GnuPG::Interface? It's really just a wrapper to running gpg from the commandline, but at least abstracts the issue for platforms where gpg isn't what I think it is. 3) Using channel.cf means that it may or may not come after local.cf. We should probably use some form of prefix to get it to load beforehand, but what? People should be able to override the channel config if they want to. I don't know if I want AA_updates_spamassassin_org.cf as a file. Pseudo code: - Script has a list of GPG keys which are allowed to sign update releases. The default is 265FA05B, which is the SA signing key. - load Mail::SpamAssassin - load Digest::SHA1 - load LWP - Accept commandline options for GPG keys to allow for signing in addition to default (for third-party updates). - Accept commandline option for whether or not to use GPG for verification. - Accept commandline options for additional channels to use beyond updates.spamassassin.org - Accept commandline option for parent directory for updates. Default is whatever the first site_rules_path value is, ie: /etc/mail/spamassassin. ala: $msa-first_existing_path (@M::SA::site_rules_path); - Accept other options such as debug, version, etc. - exit code = 255 - foreach ( @channels ): - Convert channel name to platform friendly version? Is foo.bar.baz.etc.example.com ok for all platforms? I was thinking s/\./_/g +1 on that. - read
Re: RFC: Plan for faster updates
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Robert Menschel writes: TVD Versions are tracked per channel and SpamAssassin version. To check TVD for updates, do a DNS TXT query ala TVD z.y.x.updates.spamassassin.org, TVD where z.y.x refers to the version of SpamAssassin being used, aka: TVD x.y.z for 3.0.2, etc. For simplicitly, wildcards can be used on the TVD DNS server to match a whole set of releases. An example: TVD *.0.3.updates.spamassassin.org TXT 154203 TVD *.1.3.updates.spamassassin.org TXT 158203 And I assume that *.*.3 would also be viable to accept rules for all 3.x.x versions, or more to the point, *.*.2 could be used within SARE to flag rules that apply to all 2.xx versions that predate 3.0.0. Hold on -- something's just occurred to me from the SPF development; this won't be possible, because BIND doesn't support bar.*.foo wildcards (ie. wildcards in a non-lowest-level record.) We may have to have a way to explicitly mark wildcards. in other words, do lookups like 1.0.3.updates.spamassassin.org star.0.3.updates.spamassassin.org star.3.updates.spamassassin.org star.updates.spamassassin.org (possibly N/A) - --j. TVD The directory that is to be mirrored out appropriately looks like: TVD dir/ TVD MIRRORED.BY TVD version.ext TVD version.ext.sha1 TVD ... TVD versionn.ext TVD versionn.ext.sha1 TVD with version.ext.gpg .. versionnn.ext.gpg available optionally. TVD I don't think GPG needs to be required, but for the paranoid TVD amongst us, it needs to be available as an option. Where do these updates come from? When would the GPG signature be applied, and by whom/what? Within SARE we have multiple working files, and I can see our scripts combining all files that match a given critiera into a single channel file. The original files are sometimes signed to validate them, but I don't see any value to having an automated script sign the compilation. I suppose it might be a YMMV situation. yep. at the least, this serves to avoid someone subverting a mirror and putting up their own files without at least stealing the signing key too. It's definitely a good idea. TVD At the end, the script outputs a number of channel.cf files, TVD which by default will just be read by SpamAssassin at startup TVD (leaving restarting spamd up to the admin outside the script, TVD based on exit code...) If a different directory is used, admin TVD can simply include the channel.cf file in their local.cf. Good. TVD There are a few things I haven't fully fleshed out yet: TVD 1) How to archive the update files together? I envisioned a TVD similar naming convention to our normal rules directory (ie: a TVD bunch of files named ##_type.cf), but the script should just TVD expect to download a single file which will then be expanded. I TVD don't want to rely on system calls to run an expansion, nor do I TVD want to expect tar or zip to be installed, etc. I would think that the compilation script could simply cat the component files together. eg [I often use shell as my meta language]: version=$mmddhhss # simple version calc # loop through compilation definition files. # For each definition, grab output file name from line 1. # Remainder of lines name files fed into compilation. for compilefile in $compiledir/*.compile ; do outfile=$( sed 1q $(compilefile) ) newer=no # assume this compilation not updated # For each file in the compilation, check to see if it is newer # than the last compilation built. for infile in $( sed -n 2,\$p $compilefile ) ; do if [[ $infile -nt $outfile ]] then newer=yes fi done # If any input file is newer than the last compilation built, # the build a new compilation. if [[ $newer = yes ]] then echo $version $outfile cat $( sed -n 2,\$p $compilefile ) $outfile fi done TVD 3) Using channel.cf means that it may or may not come after TVD local.cf. We should probably use some form of prefix to get it to TVD load beforehand, but what? People should be able to override the TVD channel config if they want to. I don't know if I want TVD AA_updates_spamassassin_org.cf TVD as a file. I would agree that we want all channel files to come before local.cf alphabetically, and also want them to have reasonably short names. What about a name like CH.$channel.$abbr.cf where $channel is the channel file name (eg: updates, scores, hispamnoham, etc), and $abbr is an abbreviation for the source of that channel (perhaps fed through a second field on line 1, or through the second line of the channel file). That would give us files like: CH.updates.SA.cf CH.scores.SA.cf CH.hispamnoham.SARE.cf This leaves open the question of how do we prioritize the occasional override? Let's say SARE includes an english channel, containing our rules
Re: [Bug 4124] New: New spamassassin script doesn't work due to tainting
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: Malte S. Stretz [EMAIL PROTECTED] writes: I'll fix this (it needs to be done via the B_FOO (build) and I_FOO (install) hacks). Thanks, I sent a few comments in my last message. ;-) Just to be sure: spamassassin is always in the same dir as sa-filter? So the symlink can be spamassassin-sa-filter and doesn't have to contain the absolute path (which is impossible)? Yes, it will always be in the same directory. Maybe we should require a new separate file under build for the MY stuff to remove some of the base code (as opposed to the build instructions). +1 -- I think that's a very good idea. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFCCswcMJF5cimLx9ARAgYEAJ99tjsn4l96mf6ZmRRQA4NbSRI5CQCgtucY 9/rNEprBjJCYCl0rC9f6G/c= =O7rW -END PGP SIGNATURE-
Re: Broken .htaccess in spamassassin.apache.org
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sander Striker writes: Subject says it all. Please fix ASAP. did it get fixed? appears to be working now. the contents are: Redirect /doc http://spamassassin.apache.org/full/3.0.x/dist/doc Redirect /downloads.html http://spamassassin.apache.org/downloads.cgi?update=200409211830 Redirect /favicon.ico http://spamassassin.apache.org/images/favicon.ico Sander, what URL did you see failures on? - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFCEUn6MJF5cimLx9ARAu12AJ9QML84A+pqc2fP/XCQvIAU/RqBIACguzxR LlTEsId7Jr6H8MRtPdwydR4= =im/Z -END PGP SIGNATURE-
Re: Re[2]: RFC: Plan for faster updates
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Robert Menschel writes: Hello Theo, Saturday, February 12, 2005, 11:21:16 PM, you wrote: 3) Using channel.cf means that it may or may not come after local.cf. We should probably use some form of prefix to get it to load beforehand, but what? People should be able to override the channel config if they want to. I don't know if I want AA_updates_spamassassin_org.cf as a file. TVD I haven't come up with anything for this yet. Since hit-frequencies requires numeric prefixes to give us stats concerning hit ratios, and since 60_whitelist.cf is the highest numbered file in distribution, I'd suggest maybe 65.$channel.cf for all channel files? Or use 65.update.cf for the distribution channel, and let other channels supply the numeric prefix as part of their channel name? actually, I just fixed the bug that required the numeric prefixes last week, so that's no longer a problem ;) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFCEWTyMJF5cimLx9ARAuH8AJ40NY9n4fh5cf27VKXvJDpNxKSsRACeK6g2 T7O9kDF37qjVycXuds5WGyY= =DEv8 -END PGP SIGNATURE-
Re: Trie optimisation of simple alternations for blead perl.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 demerphq writes: As you can see, except for the _construct benchmarks, B wins by a large margin. The _construct tests are designed to see what the overhead is of constructing the trie for nothing (ie, the match is at ^), and shows that the construct time is half as fast, (this is unsurprising as the entire cost of A must be carried by B as the optimisation doesnt occur until study_chunk()). OTOH the parse times are much better. perl_keywds searches for a list of words like perl keywords in the bench script that comes as part of perlbench, and shows that for this type of matching the trie is much much faster than the current mechanism. FWIW, this looks like it'd be excellent for SpamAssassin ;) I haven't had much time to look over the implementation, and I'm not really any use for reviewing it from a p5p POV due to lack of familiarity with perl internals, but the benchmark figures look fantastic and the implementation details sound good. I'd love to see this get into perl, even if just as an option enabled through a use pragma. (in my opinion, if your regexp will benefit from a trie, you will know that in advance.) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFCEl5SMJF5cimLx9ARAkZzAJwOzVx2bCXNu0S1tWCLsP9mCNrSbACfaOYb V2eYWw4dhf756XEfZccu2F8= =yTzv -END PGP SIGNATURE-