Re: Being more aggressive about finding URLs in messages

2004-09-14 Thread Loren Wilton
(I hope this reply makes it to the right place, I'm using Earthlink's webmail, which is less than intuitive about where replies will go.) We could potentially be more aggressive, but the problem becomes FP rates. You can look for anything like \w+\.\w+, but then things like Run command.com

Re: [Bug 3675] [review] pick a project logo

2004-09-14 Thread Loren Wilton
I have to say, my favorite is definitely 4.3h, but with the smooth arrow, not the cut and pasted one. :-) 3 - Define 1 (ONE) color scheme from the ones I created. 4.3h 4.3g 4.3f Hi Daniel. I mucked with 4.3h a little bit to get it a little closer to what I would like; see attached. If

Re: class renaming

2004-09-26 Thread Loren Wilton
- rename the current Mail::SpamAssassin::PerMsgStatus class to Mail::SpamAssassin::Scan Personally I'd prefer Mail::SpamAssassin::Assassinate. No, it doesn't make any obvious sense, but a project isn't worthwhile if you can't have one class somewhere with a fun name that everyone just

Re: speedup for PerMsgStatus

2004-09-28 Thread Loren Wilton
so I'm thinking that we should replace parts of this with arrays, using integer indexes, instead of hashes with string indexes. Array lookups are quite a bit faster than hash lookups. I have no idea how painful linked lists are in Perl (or if they even exist). But if you are essentially

Re: speedup for PerMsgStatus

2004-09-28 Thread Loren Wilton
I have no idea how painful linked lists are in Perl (or if they even exist). Why are you commenting then??? Because they are very useful, as I pointed out. They don't exist as a native data structure. Arrays are fast, painless, and dynamically sized. They don't exist as a native data

Re: [Bug 3340] RFE: --check-for-errors

2004-10-02 Thread Loren Wilton
Well, we were thinking --lint reports errors, --lint --debug or -D - --lint would report errors and warnings (since warnings would be generated as debug-level messages from --lint). But -D in general throws TONS of messages whether anything is broken or not, making it virtually necessary to

Re: [Bug 3864] New rule submission - SARE_MULT_RATW_02

2004-10-04 Thread Loren Wilton
Looks very interesting. The \1 is a performance-killer, though. This really needs to be implemented as an eval rule... something like sub check_whatever { my ($self) = @_; my $mid = $self-get('MESSAGEID'); if ($mid =~ m/[A-Z]{28}\.(.+?)/) { my $from = $self-get('From');

Re: limit on number of URIs decoded?

2004-10-13 Thread Loren Wilton
however: 100 URLs is pretty low. it's worth noting these are the *first* 100 URLs found in the message, but still -- there may be a way a spammer could overload this and get past SpamAssassin by loading up 100 URLs before their payload URL. thoughts? Possibly you could prioritize urls

Re: speeding up SpamAssassin

2004-10-20 Thread Loren Wilton
I find it interesting that the processor usage is (head tests, eval tests, body tests) in that order. I would normally expect headers to be no larger than the body in most cases. This implies that either my assumption is wrong, or head tests are more complex, or there are more head tests than body

Re: [Bug 3949] ALL_TRUSTED misfires when Received: parsing fails.

2004-11-05 Thread Loren Wilton
An alternate simple case to detect local mail delivery would be to count the received headers, whether they can be parsed or not, and are trusted or not. If #received-hdrs = 1 and trusted==0 and untrusted==0, assume local delivery and trust it. Probably should be a configurable option if done at

Re: [Bug 3950] [review] Exim $sender_fullhost not recognised by Received header parser

2004-11-06 Thread Loren Wilton
I think I'm going to come down on the other side of this from Tony, and from the wontfix closure on 3650 or whatever it was. General philosophy: if it is easy to f*** it up in Exim, and easy to correctly parse the f**'ed results, go ahead and do it. (Alternately, have a debug or even a

Re: Can anyone here write some plain English?

2004-12-06 Thread Loren Wilton
Doesn't the free VC install include nmake? The normal one does. The DDK also includes Nmake, and a considerably newer version than what comes with the standard VC++ 6.0. Unfortunately the current DDK only comes on CD these days, however it is still free, save postage and time. Loren

Re: [Bug 4040] 30_text_de.cf brings up error message

2004-12-18 Thread Loren Wilton
I do not agree with this conclusion. As I already commented on another bug ([Bug 3085] TRACKER_ID rule not very useful) some languages simply use longer words/sentences (on average) than English. Having no short and accurate translations of many/most computer related English terms complicates the

Re: New subproject, BlogSpamAssassin

2004-12-23 Thread Loren Wilton
Someone a few months back already implemented a way to integrate SA with at least one of the blog tools, and I think reported that it helped a lot. This was just using normal SA filtering, I believe along with a modified rule base. I think this implementation was pre-3.0, or no later than the

Re: A Feature I've always wanted - Test for multiple hits on same rule

2004-12-28 Thread Loren Wilton
Any thoughts on this? For certain rules I think it would be a great idea. The trick is not so much running the rule globally, as it is getting the hit count to use in the score generation. I don't think (although I may be wrong) that Perl can tell you how many times a regex hit in a global

Re: A Feature I've always wanted - Test for multiple hits on same rule

2004-12-28 Thread Loren Wilton
I'd have to take this into account when optimising the scores. Then, since the scores would be optimised for multiple hits, spammers would only have to reduce the number of hits to evade SpamAssassin. This strikes me as more of an implementation problem than an argument against the concept.

Re: A Feature I've always wanted - Test for multiple hits on same rule

2004-12-28 Thread Loren Wilton
If I don't take it into account when optimizing the scores, then the increased scores will cause more false positive errors. What you probably need to take into account is the cumulative score for the rule in the test corpus. Which of course you do for all rules. The only oddity you would

Re: svn commit: r124477 - /spamassassin/trunk/lib/Mail/SpamAssassin/EvalTests.pm /spamassassin/trunk/rules/20_body_tests.cf /spamassassin/trunk/rules/70_testing.cf

2005-01-08 Thread Loren Wilton
oh good, so you've changed your mind since http://bugzilla.spamassassin.org/show_bug.cgi?id=3781#c3 then ;) Somewhat. I still think it should be a plugin. There's a problem with plugins I hadn't realized when they were originally being advertized as the universal solution to oddball rules.

Re: SURBL whitelist volume chicken-egg problem

2005-01-15 Thread Loren Wilton
I was thinking along the lines of something that SpamAssassin downloads once a month, or queries to find out if there is an update available and only downloads if there is. Since the idea is to limit DNS queries, of While it isn't part of the offical SA project, this sounds like exactly a job

Re: svn commit: r125369 - /spamassassin/trunk/rules/70_scraped.cf

2005-01-17 Thread Loren Wilton
they solved and I'm sick to death of them! :(Some sensible wrapping code would be simpler, and save EVERYONE a lot of trouble. I don't think wrapping code is a solution at all. Email is fundamentally 80 columns. Names that go over about a quarter of that length mean that the

Re: [Bug 3997] SURBL FP on a particular domain name

2005-01-19 Thread Loren Wilton
next if ($answer-type ne 'A' $answer-type ne 'TXT'); # skip any A record that isn't on 127/8 next if ($answer-type eq 'A' $answer-rdatastr !~ /^127\./); Shouldn't that prevent what Vance's comment #23 debug log output shows even if the wrong query's results were associated with a

Re: svn commit: r125722 - /spamassassin/trunk/MANIFEST /spamassassin/trunk/lib/Mail/SpamAssassin/Conf/Parser.pm /spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm /spamassassin/trunk/t/SATest.pm /spamassassin/trunk/t/desc_wrap.t

2005-01-20 Thread Loren Wilton
I agree with Daniel that the new formatting isn't real nice; and in the cited example I can't see any reason for the formatting change. However I also strongly agree with Justin's reasoning and stated limits on doing this. I think I regard this as bugs in the new formatting code that can be

A reason to be able to write tests against MIME section headers...

2005-01-23 Thread Loren Wilton
Hum, what's wrong with this encoding... --=_NextPart_000_91FF8_43B69930.6DEB20A0 Content-Type: text/plain; charset=iso3 8 2isyw34 8 8udg

Re: uridnsbl: bogus rr run ...

2005-04-25 Thread Loren Wilton
How about a simple debug printout of the id value sent and the id value received? Maybe it is as simple as the id matching code is failing. Loren

Re: uridnsbl: bogus rr run ...

2005-04-27 Thread Loren Wilton
How big is the masscheck run? Probably lots of messages and lots of requests? I'm betting at some point the client pipe number cycles. It is limited to a range significantly smaller than 2^16, but I don't recall the exact range limits. How, why this should make you start seeing duplicates I

Re: uridnsbl: bogus rr run ...

2005-04-27 Thread Loren Wilton
$self-{_initted} = $$; srand; $self-{resolver}-reinit_post_fork(); In the C/C++ world, srand has a parameter. I suspect the C srand() function underlys perl's srand statement. Depending on the value of the parameter that Perl is deducing from that statement, you may or may

Re: Bayes scores for 3.0.3

2005-04-27 Thread Loren Wilton
I'd certainly vote for that, if I had a vote! Loren

Re: [Bug 4314] Determine how to merge init.pre files from release to release

2005-05-11 Thread Loren Wilton
How about making due dilligence easier when you KNOW there is a probable change required (ie: the need to add a plugin line to maintain default bahaviour that didn't require such a line in the previous release(s)). At the end of make install or some reasonable place, run a scan of init.pre, or

Re: svn commit: r169589 - /spamassassin/trunk/lib/Mail/SpamAssassin/Conf/Parser.pm

2005-05-11 Thread Loren Wilton
I believe other delimiters are legal. Indeed. I commonly write m'stuff'i if I'm going to be matching slashes. BTW, I also will commonly write things like =~ /BADWord/# no /i to make it clear that, no, I *did not* forget to make that test case insensitive. Depending on what leads

Re: libspamc interface

2005-05-14 Thread Loren Wilton
I prefer uint for bitmasks. Other than that it seems fine. I like the idea of going to tell or some such; 'collabreport' gives me the shudders. Loren

Re: Weekly net run, still has issues

2005-05-22 Thread Loren Wilton
How interesting. Six of em, and all in a row! Loren

Re: Error message building with dev version of Net::DNS

2005-06-20 Thread Loren Wilton
It sounds like whenever we check for version, we should do something like: $version =~ s/_\d.*$//; Why not s/_/\./ instead? Or if you wantr to treat it as a float, just drop the underscore or replace it with a zero? Either of those would preserve the fact that you have a fractionally higher

Re: Normalized text ruletype

2005-06-21 Thread Loren Wilton
Title: RFC: Normalized text ruletype Wow, neat! I've been looking at something like this for quite some time. Adding in pipes and some of the other characters known to be used for obfuscations could well drastically increase your hit ratios, they are really common. I think this is quite

Re: rule syntax question

2005-06-23 Thread Loren Wilton
Is this the right forum for this question. The user's list would have been the more appropriate place. body DRUG_ED_CAPS /\bCIALIS|LEVITRA|VIAGRA/ according to my resident regex expert will only look for the \b in front of the CIALIS, and not in front of LEVITRA or VIAGRA Seems

Re: autofoo guru needed :)

2005-06-28 Thread Loren Wilton
I just remembered that I used strchr(3) in my last commit to spamc and according to the man page is that one part of C99, so might be missing on some system (?). FWIW, I don't believe I've *ever* seen a C/C++ implementation for any real system that didn't have strchr, all the way back to the

Re: [Bug 4470] Leading 0 in scores

2005-07-08 Thread Loren Wilton
_SCORE(PAD)_ message score, if PAD is included and is either spaces or zeroes, then pad scores with that many spaces or zeroes There may be another bug here. I tried a few days ago using a pad of ( 00) and ended up with the space ignored. What I wanted

Re: 3.10pre4 tests much better (almost complete) under Cygwin

2005-07-14 Thread Loren Wilton
Currently SpamAssassin 3.10pre4 gets ALMOST all the way through the tests. Failed Test Stat Wstat Total Fail Failed List of Failed -- -- --- t/meta.t 21 50.00% 1 Then you are clean, this is a

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-19 Thread Loren Wilton
A big part (perhaps the biggest part) of rules development is the mass check. Most anyone can develop a rule on their home system and see how they *think* it works. Some few (but not many) people can do a mass-check on their home system and see how it *really* works - *for them*. As proposed,

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-19 Thread Loren Wilton
As rules are put into the sandboxes, they become part of svn. When the nightly mass-checks are run, each person pulls the latest rules sandboxes from svn and does their mass-check with all of those, then rsyncs the results back up to the central site once the mass-check completes. I think I

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-19 Thread Loren Wilton
Duncan wrote: I think the first point is the bigger one. Ultimately, Dan's sandbox proposal may solve part of the not enough rules problem by making it easier for people to contribute rules. But I'd like to hear from potential rule submitters -- would this be a step in the right direction? Is

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-20 Thread Loren Wilton
I'd like to see if there's a way to combine the two somehow so that new SVN commits that update sandbox rules, are immediately mass-checked alone. However, I can't see a way to do that reliably from SVN commits alone, because (for example) meta rules may depend on other rules that were not

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-20 Thread Loren Wilton
What I miss most is a transparent dataset about every rule. I'd like to know - percentage of false positives - percentage of flase negatives - percentage of true positives - percentage of true negatives - number of mails checked for the results above - standard deviation of the percentages

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-20 Thread Loren Wilton
Sidney writes: Perhaps we could use SVN to check in rule submissions so they are version controlled and tracked, and have emails refer to file paths and version numbers instead of attaching the rules. Would that be too complex for the people we want to attract compared to mailing in sets of rules

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-20 Thread Loren Wilton
Could the list be a semi-private one, with moderated subscription and posting? That'd take care of rules in development being exposed to spammers while they're still being worked on, at least partially. The SARE list is private and invitation only for exactly these reasons. You don't want to

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-20 Thread Loren Wilton
Sidney writes: Dealing with metarules and modifications to them presents a problem in any case. How do we deal with person X submitting a modification to metarule A and proposed rule A1, while person Y submits a different modification to metarule A and proposed rule A2 while person Z submits

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-20 Thread Loren Wilton
Dealing with metarules and modifications to them presents a problem in any case. How do we deal with person X submitting a modification to metarule A and proposed rule A1, while person Y submits a different modification to metarule A and proposed rule A2 while person Z submits proposed

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-20 Thread Loren Wilton
--- I guess that part of making the rule submission and test process nimble is for the submitted rules to be independent of anything else. That makes changing metarules less of a nimble process. That's fine, because metarules are really just an optimization which can be implemented after the fact

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-21 Thread Loren Wilton
Duncan earlier enscribed: Masscheck has an interdependency option, although it increases the checking time. We use it on rules once they seem useful, but not usually in early one-off checking. I'm not sure what you mean by this. We have an overlap script which does some of this -- is that

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-21 Thread Loren Wilton
I'm *really worried* about proposals that involve mailing lists that have only private archives and require moderator approval for subscription. It just doesn't feel right for an open source project. I understand the feeling. I'm trying to balance the obvious desire for a completely public

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-21 Thread Loren Wilton
I guess you'd have better data than I would; but I'm still having trouble believing that Spammers are adjusting on that time frame. Some do; not all do. However, the ones that can adjust in less than a day, or maybe less than 2-3 days sometimes, tend to be some of the more prolific spammers.

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-21 Thread Loren Wilton
May I help? (How will you folks decide) Well, to paraphrase how we decide in SARE -- do something, we'll watch. And it really is pretty much that simple. I expect (and this is personal opinion, I'm not an SA dev) that the rules subproject will sooner or later consist of annointed

Re: [Bug 4497] New: reorganise PerMsgStatus code

2005-07-23 Thread Loren Wilton
I know user rules aren't real popular with the sa dev community, however that attitude isn't universally shared by sa users. Therefore may I suggest: Would it be possible when reorganizing things to come up with some semi-persistant storage for compiled user rules, so that they don't have to be

Re: compiled user rules

2005-07-24 Thread Loren Wilton
it's not a matter of popularity -- it's a matter of being horrendously difficult to support. I grant from what I've seen of PMS that this gets pretty ugly. Or at least it seems to to me, but then a lot of apparently good Perl looks pretty ugly to me. ;-) But I'm a C++ and Algol programmer,

Re: Re[2]: Hackathon summary

2005-07-25 Thread Loren Wilton
That's why we use 70_sare_name_eng.cf files, to indicate that these rules work well only on systems which expect almost 100% English ham, and little to no ham in other languages. I've begun to wonder whether it might be worth while having 50_scores.cf for English emails, and then

Re: Re[4]: Hackathon summary

2005-07-26 Thread Loren Wilton
How would we determine ham/spam? At this point all we have is SA's first estimation, and no way of knowing whether this is accurate, FN, or FP. All we could reasonably do is take SA's assment of the message and assume that statistically it will be correct to one or two sigma or so. If the

Re: Re[4]: Hackathon summary

2005-07-26 Thread Loren Wilton
More thought ... what if SA systems were to accumulate daily statistics, along the lines of one record for each rule, containing: That sounds like the general sort of vague idea I had, fleshed out in more detail. Certainly the desirable goal is basically: 1 does this rule hit anything? 2 does

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-27 Thread Loren Wilton
Example: I am currently writing a very FEW rules, some from scratch and some by adapting the work or ideas of others from such lists or web sites. You have all convinced me that if I post a rule for discussion that it is then close to worthless. It depends on how you post it. And it may

Re: PerMsgStatus

2005-07-29 Thread Loren Wilton
a) what the heck are priorities, who sets them, and do they really have any justifiable purpose? Ie: can they just quietly vanish into the night with nobody being any the wiser? They order the rules -- or more correctly, sets of rules. Most rules are priority 500 (iirc), but some need

Thoughts/ramblings on rule short circuiting

2005-07-29 Thread Loren Wilton
I was thinking about the 'best' wat to shortcut running rules when they weren't needed, and suddenly realized there might be cases where it is necessary to run them even though they won't determine the hammyness or spammyness of the mail. In particular, I'm wondering about bayes and awl

Re: Thoughts/ramblings on rule short circuiting

2005-07-29 Thread Loren Wilton
It seems obvious that we want to run that -100 rule first. If it hits, the maximum possible score if *every* other rule hits will be 4, and with a threshold of 5, the mail can't be spam. So we can stop after the -100 rule hits, and only run one rule on this mail. This just brought up an

Re: [Bug 4505] Score generation for SpamAssassin 3.1

2005-07-29 Thread Loren Wilton
+score BAYES_50 0 0 0.845 0.001 # n=1 +score BAYES_60 0 0 2.312 0.372 # n=1 +score BAYES_80 0 0 2.775 2.087 # n=1 +score BAYES_95 0 0 3.023 2.063 # n=1 +score BAYES_99 0 0 2.960 1.886 # n=1 I think the score for BAYES_99 should be hand tweaked, regardless of what the score generator said. This

Re: PROPOSAL: create SpamAssassin Rules Project

2005-08-01 Thread Loren Wilton
naming isn't really much of a big deal but it'd be nice to have some way to keep track of that. (not that I can think of it.) Look at some of the SARE rule files that Bob maintains. He has a formalized set of comments that get stuck to rules, and one of these can/does show the history

Re: [Bug 4513] New: outgoing mail

2005-08-03 Thread Loren Wilton
You need to ask this question on the users list. This list is to discuss spamassassin development.

Re: [Bug 4514] New: Hotmail/dav mail from Outlook Express marked as FORGED_MUA_OUTLOOK

2005-08-03 Thread Loren Wilton
Are you SURE that was a valid message? If so, it will be the first recorded instance of X-Message-Info showing up in ham and not only in spam. Previously that had been a sure sign of a spam tool generated mail.

Re: problems detecting URIs embedded in JIS encoding

2005-08-08 Thread Loren Wilton
This is quite similar to two recent bugs that caused similar problems if certain ascii characters immediately followed the URI. Spammers had exploited at least one of those cases. I don't know what the fix was for those bugs, but it may have been similar to the change you propose. Loren

Re: problems detecting URIs embedded in JIS encoding

2005-08-09 Thread Loren Wilton
Could you please point this thread at the two bug numbers? I'd like to target these for a future 3.0.5 bug-fix release, because we are very unlikely able to upgrade our Enterprise distro to 3.1 in the short to medium term. (I am hoping in the long term to have both RHEL4 and RHEL5 on

Re: initial rules organization ideas

2005-08-18 Thread Loren Wilton
Agree in general, but possibly... 2. code-tied rules stay with main tree in current rules directory with the exception of 25_replace.cf which is really just another way to write body/header rules (basically, the static stuff that is tied to code does not move to the rules project)

Re: [Bug 4547] Spamassassin not checking messages

2005-08-19 Thread Loren Wilton
How big are they? SA is set up to bypass messages over a given size.

Re: Preliminary design proposal for charset normalization support in SpamAssassin

2005-08-19 Thread Loren Wilton
The following functions, immediately after they all Mail::SpamAssassin::Message::Node::decode, need to call a function that does charset normalization. * Mail::SpamAssassin::Message::get_rendered_body_text_array * Mail::SpamAssassin::Message::get_visible_rendered_body_text_array *

Re: IPv6 in DNS-RBL. First glitch found.

2005-08-22 Thread Loren Wilton
Just looking from the sidelines, it seems the obvious answer would be to add a new namespace to the blacklist. eg: *.2.1.9.ipv6.rbl.example.org. instead of *.2.1.9.rbl.example.org. Since this is for numeric lookups, and alpha or alphanum tag in what would be the high octet of the ipv4 dotted

Re: initial rules organization ideas

2005-08-23 Thread Loren Wilton
Justin writes: I think we don't even need to do that; once we get the search directories recursively code worked out for configuration and rules, plugins will be loadable from *any* directory in the rules project: ROOT/rules/group/20_name_of_file.cf

Re: initial rules organization ideas

2005-08-23 Thread Loren Wilton
I *think* what Daniel was thinking of here, which should work, is just using the ifversion commands to conditionalize too-advanced rules. Assuming ifversion can be used in the negative also. For instance, we have one set of meta rules that use addition post-whatever, and do a less-good job

Re: Sizing a system for Spam Assassin

2005-09-07 Thread Loren Wilton
Better asked on the user's list, where there are people running systems like that. Loren

Re: Spamassassin for TREC (fwd)

2005-09-08 Thread Loren Wilton
Note also echo score MICROSOFT_EXECUTABLE 4 .spamassassin/user_prefs Isn't that a 2.6x rule that went away in 3.0? I would hope that anything comparing filtering results (as I would guess this to be, knowing nothing of it) would be using a reasonably recent version. (Of course it would

Re: [Bug 4415] Intermittent __alarm__ errors with various plugins

2005-09-08 Thread Loren Wilton
As ancedotal evidence, its my belief that people are seeing _alarm_ log records and associated scan failures on both rc1 and rc2, and that they are occuring with more than just Pyzor. This is anecdotal however, I don't have any evidence to hand to support that. I'm personally wondering if this

Re: [Bug 4442] Lint should warn if user rules found and allow_user_rules not set

2005-09-20 Thread Loren Wilton
Well, user rules are always allowed when 'spamassassin' is run so a --lint message would have to say if you plan on using spamd your user rules won't be used. On the other hand, spamd when called with -Dconfig, will tell you it's not parsing each of your user rules. So... do we really want

Re: rules project -- a new way to do fast-turnaround mass-checks

2005-10-05 Thread Loren Wilton
Please let me know what you think! Daryl and Chris both make a number of good points, but the buildbot idea also seems to have a good deal of merit. A creative solution for the 'private corpus' problem that Chris mentions might help a lot though. Unfortunately I don't have one at the moment,

Re: BZ and rules

2005-10-13 Thread Loren Wilton
You know, I don't know if there'd be a separate bugzilla. good question... I think the mostly likely thing would be that the rules project stuff would be under the (existing) Rules component in BZ. I don't know that BZ would get much use or be of much use in day to day rules testing and

Re: How to use sandboxes?

2005-10-18 Thread Loren Wilton
Some random comments: So the idea is that the source code for all rules (apart from the legacy core and lang sets) remains in the sandbox dirs; in other words, there's no need to cut and paste and move around rules when they're promoted from testing status, to live core status. I'm not

Re: Bugzilla has moved!

2005-10-18 Thread Loren Wilton
Not too important, but the quip software is dumping SQL debug info: Maybe that depends on what you are doing. I tried to log in unsuccessfully: Software error:DBD::mysql::st execute failed: You have an error in your SQL syntax near '' at line 1 [for Statement "SELECT login_name FROM

Re: Bugzilla has moved!

2005-10-19 Thread Loren Wilton
Now that I can log is, I see why it isn't really important. Loren

Re: hit-rate-over-time graphs

2005-10-28 Thread Loren Wilton
Hum. Is there any way to configure some default colors for the graph? On a PC it seems Quicktime prints the thing out, and it is near unreadable. I see a black square with a straight yellow line in the center and some wiggly lines near the bottom. I *think* there might be some text in the

Re: [Bug 4594] spamd dies unexpectedly: prefork: ordered child to accept, but child reported state '1'

2005-11-07 Thread Loren Wilton
Not in my case Tom. I actually have all the Bayes features disabled and the error still happened on my installation. But do you have AWL disabled too?

Re: Nightly runs still not working right

2005-11-07 Thread Loren Wilton
I suppose mkrules could be changed to cat all the files parsed so far, so that a sandbox file can refer to a core file's rule by name (since sandbox will be compiled after core); but I quite like the side-effect of restricting sandbox files to only being able to affect rules in their own

Re: [Bug 4679] Bayes is undocumented in README, USAGE, INSTALL and UPGRADE

2005-11-10 Thread Loren Wilton
'As a collaborative documentation platform, the wiki has already proved much more effective than our SVN codebase.' So why not write a routine to scrape the Wiki on the day of release and stick the pages into files in the release tree? Loren

Re: rule promotion criteria

2005-11-12 Thread Loren Wilton
Looks generally good. Minor comments: 1. Bob had a thing built into his version of mass-check that assigns default scores. I'm not clear on the basis for this (although he has explained it any number of times) but it is fairly simple and seems todo a decent job, shy of a full scoring run. I'm

Re: 3.0.5 rescoring

2005-11-21 Thread Loren Wilton
Hello Warren, There was also a recent discussion about using SVM scoring techniques, and someone posted a tool to do that. I believe the claim was that it produced reasonable scoring with less effort than the normal method. Perhaps that could be used here? Loren

Re: tvd-evaltoplugin

2005-12-30 Thread Loren Wilton
Converting sections of tests into plugins where some people will want to disable the entire set due to performance, memory, or similar constraints (i.e., Bayes tests, network tests, special functionality, etc.) does make sense. However, converting individual (or nearly individual) tests that

Re: promoting spamtraps

2006-01-01 Thread Loren Wilton
Whether your idea is good or not, it has to do with a suggestion for how to use sa-learn, not anything to do with development. Hi Sidney, happy new year! Actually, while he phrased the RFE in terms of sa-learn, it is actually something that could be done as an SA plugin, if SA were run on the

Re: tvd-evaltoplugin

2006-01-06 Thread Loren Wilton
You can do that with the plain regex rules thanks to the experimental and rather loony (?{...}) and (??{...}) constructs. Well no. You could do that on 2.6x, and I used that for some very valuable rule development tools. That ability was removed in 3.x. Loren

Re: DATE_IN_ tests

2006-01-06 Thread Loren Wilton
anyway, I've just checked in a change that'll allow hit-rates all the way down to 0.02%. why not. ;) I guess I question active hitrates much under 1%. The key there is 'active'. Things that may be hitting next to nothing in one corpus might be hitting well in another one. Loren

Re: Security-related bugs

2006-01-11 Thread Loren Wilton
IMO, bugs which allow any specially crafted spammy message to get through, even if the method used is to crash spamd or stand-alone SA, is NOT a security bug, provided the only damage is to SA/spamd and the resulting FN. That's a bug, pure and simple, no matter how creative the spammer is.

Re: What's up with these URLs?

2006-01-11 Thread Loren Wilton
At a guess: IE and apparently Firefox have search for url enabled by default. In IE that consists of sticking .com, .net, etc suffixes on, and I think trying a www. prefix. From a report on the user's list, it appears that Firefox goes farther and will do a google search, resulting in a tinyurl

Re: Charset normalization issue (report, patch, and request)

2006-01-14 Thread Loren Wilton
As an outsider, I find myself strongly agreeing with Motohraru-san that, when dealing with at least the oriental multibyte languages, tokinization belongs early in the stream, before both bayes and rules. Of course this is an overhead penalty that should not occur on mail that isn't likely to be

Re: [Bug 4766] New: remove SUBJ_HAS_UNIQ_ID and triplets.txt code

2006-01-21 Thread Loren Wilton
in other words it's been dropping from a high of 19.348% of spam to just 0.38% nowadays. Which isn't to say that there aren't unique ids in modern subjects. They just aren't in a form this can detect. :-) Loren

Re: local_state_dir stuff in 3.2 breaks eval rules ...?

2006-02-10 Thread Loren Wilton
default_rules_path (/usr/share/spamassassin) site_rules_path (/etc/mail/spamassassin) default_userprefs_path (~/.spamassassin/user_prefs) Doesn't that imply that site rules override local rules? Surely those are in the other order? Or is there magic when reading the second file

Re: Rule Timeouts (was users@ Re: Two mails completely blocking SA 3.1.0 !)

2006-02-14 Thread Loren Wilton
Should we be wrapping full rules in alarms (using M::SA::Timeout) to prevent this? You can do this with any rule, a full rule is just easier to mess up. I'd be concerned of the overhead (and probable timing holes) in wrapping every rule in an alarm(). As an alternative, how about wrappring

Re: tvd-evaltoplugin

2006-02-25 Thread Loren Wilton
One big plugin would be better than the current split. The current split has no solid technical rationale behind it. - allows eval rules to not be loaded. arguably, most of them will always be enabled, but some could be disabled. DNSEval, for instance, is only useful in net mode. If

Re: move full rule functionality into a default-off plugin

2006-03-08 Thread Loren Wilton
of Bayes and URIBL. There would probably be a much lower-overhead solution, say SpamBayes, if SA's rules capability is effectively removed. Which seems to be the effective intent of this proposal. Loren Wilton

Re: moving uribl to rules dir (Re: svn commit: r383618 - in /spamassassin: rules/trunk/core/25_replace.cf rules/trunk/core/60_whitelist_spf.cf trunk/rules/25_replace.cf trunk/rules/60_whitelist_spf.cf

2006-03-08 Thread Loren Wilton
If it's a plugin, it has to be a code-tied rule! Otherwise it wouldn't need the plugin. Hey, what a neat way to completely disable the initial concept of the Rules project and put things back into the Land Of Arcana where they belong! Just move 'body', 'rawbody', 'header', and 'full' to

  1   2   >