Re: svn commit: r169334 - in /spamassassin/trunk: MANIFEST lib/Mail/SpamAssassin/Conf.pm lib/Mail/SpamAssassin/HTML.pm lib/Mail/SpamAssassin/PerMsgStatus.pm lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm lib/Mail/SpamAssassin/Util.pm rules/20_uri_tests.cf t/uri.t t/uri_html.t
Theo Van Dinter wrote: Sorry to be a killjoy here. I have no problem with the criticism, but I think I've hit the end of what I'm going to do on this one now that it's working without breaking anything. I'm running out of time for some schoolwork that's due in a month and will have to concentrate on that. The changes you are talking about are about cleaner design, and I'm +1 for that. And now is a good time to do it, while the issues are fresh, so that it doesn't become some awkward design embedded in old code. But I won't be making those changes. We're still in CTR mode in 3.1, right? I was acting like it was RTC on this one because it felt right to get feedback first when it looked like it would require some changes to the object design, but I don't think it needs a lot of discussion for the last cleanup, so go to it. -- sidney
Re: proposal for release
Daniel Quinlan said: I propose that we make a 3.0.0 release Are we going to be able to close bug 3675 first (he asks innocently, after having made trouble on reaching consensus on that very bug :-) )? That's the only one left marked with a 3.0 target. -- sidney
Re: proposal for release
Daniel Quinlan said: Yes. It's 4-to-3 in favor of orange +1 for 3.0 release!
Re: [SpamAssassin Wiki] Updated: InstallingOnWindows
(Note: someone may want to address [http://it.slashdot.org/comments.pl?sid=122734cid=10320250 these complaints] about this document.) I did post a response in http://it.slashdot.org/comments.pl?sid=122734cid=10325272 [Anyone got some spare mod points? :-) ]. There is one issue I missed and I would like someone who can install SpamAssassin on a Windows machine to confirm something for me as I am temporarily Windows-deprived while my laptop is being repaired. The slashdot post complains about the complexity of the steps the Wiki page lists for generating the HTML doc files. Near as I can tell nmake text_html_doc should be all that is required and would work under Windows. Can someone please verify that and then we can update the Wiki? Thanks, -- sidney
Cluster analysis in Mac spam filter
I stumbled across this article http://www.macdevcenter.com/pub/a/mac/2004/05/18/spam_pt2.html while Googling around for anything that relates cluster analysis techniques to spam filtering. This may be old knowledge to some people here, but was new to me. Apparently the trainable spam filter in Apple's Mail program does not use the Bayesian approach that we are familiar with. It uses a cluster discovery tool that was developed for document search and retrieval. It would be interesting to compare this approach to Bayes. I'm also curious if this provides some hints about using some techniques from bioinformatics (as Justin referred to in a recent message to this list) such as UPGMA cluster analysis( http://www.nmsr.org/upgma.htm ). -- sidney
Re: Cluster analysis in Mac spam filter
Henry Stern wrote: Apple Mail uses latent semantic analysis for clustering That sounds right. Some people there were looking at that for document retrieval when I worked at Apple Research in the mid-90's. By the way, have you seen the work applying cased-based reasoning to spam filtering? There are two articles on that at http://www.cs.tcd.ie/publications/tech-reports/tr-index.04.html with a bit more at the home page of one of the authors: http://www.comp.dit.ie/sjdelany/ I've been thinking about whether there might be benefit in making a finer distinctions than just spam or not-spam, by clustering into perhaps spam topics. Why should the characteristics for porn spam, multilevel marketing spam, Nigerian 419, etc., be combined? Would there be benefit from making their differences explicit? -- sidney
Re: Cluster analysis in Mac spam filter
Henry, In the paper An Assessment of Case-Based Reasoning for Spam Filtering http://www.comp.dit.ie/sjdelany/publications/AICS%202004%20(crc).pdf the authors compare CBR and a naive Bayes (NB) with one conclusion (on their test data, with their implementation of NB) that daily updating of the training data using misclassified mails caused an improvement in FPs but a degradation in FN rate that led to an overall negative effect on their measure of performance. How does that compare to your results on the effect of training and learn on error vs learn on everything? If CBR does end up better than NB when used with learn on error, that is an advantage in terms of computational resources required. -- sidney
Re: Cluster analysis in Mac spam filter
Sidney Markowitz wrote: caused an improvement in FPs but a degradation in FN rate Typo - I left out mention that the result was using NB, and not using CBR. -- sidney
Re: reporting to spamcop
andrew collier wrote: i have the following problem when reporting spam This mailing list is used by SpamAssassin developers to discuss ongoing development work on SpamAssassin. Your question has nothing to do with that. Your question is appropriate for the SpamAssassin users mailing list (see the SpamAssassin wiki article http://wiki.apache.org/spamassassin/MailingLists ) Be sure to search the list archives and the wiki for an answer before you post your question. You get an answer faster by finding it already posted than by asking it again, if the answer is already available. If you have identified a bug in SpamAssassin (which is not in evidence in the message you posted) the appropriate action is to confirm it on the SpamAssassin users mailing list and by searching the SpamAssassin wiki and the Bugzilla database, then report it there. -- sidney
Re: limit on number of URIs decoded?
Justin Mason wrote: The first fix is truncation of the text before passing to TextCat. Michael, I think you were looking at this? the results are impressive, if the text is truncated to 32k bytes: It was me. I've been looking at ways to not have to create so much garbage (I'm a lisp hacker -- I'm not using the word in the pejorative sense) in that loop in create_lm, but the simplest way of dealing with it this is to truncate $input to perhaps 10,000 bytes in the call to create_lm. Since TextCat is just a heuristic for determining the language and there is no incentive for spammers to, for example, prefix a Spanish language message with 10,000 bytes of English words just to slip through the spam filters of English-only speakers, the first 10,000 bytes is plenty as a limit. Language recognition accuracy does not improve noticeably past one or two thousand characters, while going to less than 10,000 does not provide much additional speed or memory benefit. If there is no real language text in the first 10,000 characters of rendered body, then it will not be recognized as any language and the rule will not fire, failing safely. I propose putting in the truncate for 3.0.1 as a quick and safe way of around the problem we saw with that malformed MIME message. I'll keep playing with the loop just in case I can speed it up enough for the 3.1 time frame to not have to truncate, but we should do the quick fix right away. -- sidney
Re: svn commit: rev 54716 - in spamassassin/trunk: . t
Added: spamassassin/trunk/t/memory_cycles.t I just noticed this now while trying to make test on a machine that doesn't have Devel::Cycle. Is that going to be a documented requirement now? -- sidney
Re: svn commit: rev 54716 - in spamassassin/trunk: . t
Justin Mason wrote: the test should be a no-op without that module did that not work? This is extracted from output of make test, running under Cygwin with perl 5.8.5 t/memory_cycles.Can't locate Devel/Cycle.pm in @INC (@INC contains: t . ../blib/lib /c/sasvn/trunk/blib/lib /c/sasvn/trunk/blib/arch /usr/lib/perl5/5.8.5/cygwin-thread-multi-64int /usr/lib/perl5/5.8.5 /usr/lib/perl5/site_perl/5.8.5/cygwin-thread-multi-64int /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.5/cygwin-thread-multi-64int /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl) at t/memory_cycles.t line 66. BEGIN failed--compilation aborted at t/memory_cycles.t line 66. -- sidney
Re: [Query] Whitelist
ratan kamath wrote: Query: If a mail arrives [...] This mailing list is used by SpamAssassin developers to discuss ongoing development work on SpamAssassin. Your question has nothing to do with that. Your question is appropriate for the SpamAssassin users mailing list (see the SpamAssassin wiki article http://wiki.apache.org/spamassassin/MailingLists ) Be sure to search the list archives and the wiki for an answer before you post your question. You get an answer faster by finding it already posted than by asking it again, if the answer is already available.
Help with bug 3917
Fred, I noticed you mentioned in a bug comment about getting some information using Ethereal. If you are also running Cygwin, could you help a bit with bug #3917? I'm stuck because of some firewall issues that I have not yet tracked down on the home machine where I can test. What I'm trying to do is get a network capture of the problem to see what exactly is failing when there is the protocol error. I have shown that running the test case using spamd on Cygwin and spamc on another box (which could be linux) will demonstrate the problem. Unfortunately, Ethereal (or anything using winpcap) will not capture anything when the client and server are on the same machine, and I can only get my machines to talk through an ssh tunnel which prevents sniffing. So if you have Cygwin and another box and the time and can reproduce the problem, that would help. Thanks, -- sidney
Re: ?
Alexandr Orlov wrote: X-Spam-Status: SpamAssassin Failed It does not appear anywhere within the SpamAssassin source code. Googling for that exact header showed up a number of messages with it, all spam. At first I thought it must be a fake header added by some spammers to try to fool SpamAssassin, but it always appears at the top of the mail, after an Envelope-To header and before the first Received header. I don't see how a spammer could place a header there. Check with a sysadmin for the mail server from which you receive mail to see if they add that header and why. It might make a good spam sign... Does anyone here see the header in corpus mail? -- sidney
Re: svn commit: r106170 - /spamassassin/trunk/spamd/spamd.raw
Daniel Quinlan wrote: Please try to use the more standard perl formatting: Do you see anything wrong other than two of the lines being more than 80 characters? I'll check in an update to fix that as soon as I finish running a make test on the change. -- sidney
Re: svn commit: r106170 - /spamassassin/trunk/spamd/spamd.raw
Justin Mason wrote: Sidney -- I think it's the foo( bar ) vs. foo(bar) I prefer that too. I copied the style that was already in the code, and I looked for something about that in the style guide and did not see any mention of it one way or the other. Unless it is there and I missed it, you or Daniel should add something about it on the wiki page. -- sidney
Re: svn commit: r106173 - /spamassassin/trunk/spamd/spamd.raw
Daniel Quinlan wrote: Heh, I was most talking about the paren style, actually, not the line length (although now that you mention it). There are a few hundred spaced parens in spamd.raw. I'll fix the lines I changed if you want, but if it's ok with you I won't do a massive edit of the file. Or I can just keep it mind for next time I check something in. -- sidney
Re: svn commit: r106170 - /spamassassin/trunk/spamd/spamd.raw
Daniel Quinlan wrote: * No space between function name and its opening parenĀ thesis. I did see that. That would allow foo( bar ) which is what I did. If you want foo(bar) as a preferred style it would have to be added to the wiki page. -- sidney
Re: svn commit: r106600 - /spamassassin/trunk/t/SATest.pm
I just tried a quick build and make test in Windows XP to see what it would do, and 1. I could not reach the svn server from svn, although I could ping it. Is it down? 2. I got lots and lots of Use of uninitialized value in concatenation (.) or string at ..\lib/Mail/SpamAssassin/ArchiveIterator.pm line 1023. 3. I realized that I would not be able to test the use of netstat anyway because Windows does not run spamd. You can set environment variables to tell the spamc tests to assume that spamd is already running on some ip address and port, but that isn't relevant to this issue. -- sidney
Re: svn commit: r106600 - /spamassassin/trunk/t/SATest.pm
The error message from ArchiveIterator.pm is because Windows does not define $HOME environment variable by default. It has $HOMEDRIVE and $HOMEPATH which together server the same purpose. The code in ArchiveIterator.pm has to be changed to check for Windows, or else we can document the need to set a $HOME. Do we use $HOME anywhere else? I just found it because I used to have HOME defined in my XP environment for other reasons. -- sidney
Re: svn commit: r106600 - /spamassassin/trunk/t/SATest.pm
Malte S. Stretz wrote: What does getpwuid() say on Windows? Not implemented :-) You can't use getpwuid in Windows. The usual portable implementation checks for running under Windows and uses $ENV{'HOMEDRIVE'} . $ENV{'HOMEPATH'} if it is instead of $ENV{'HOME'}, being careful about the former using '\' separators instead of '/'. -- sidney
Re: svn commit: r106600 - /spamassassin/trunk/t/SATest.pm
Malte S. Stretz wrote: So maybe we should add a M::SA::Util::get_home() which first tries $ENV{HOME}, then on Windows $ENV{HOMEDRIVE}\$ENV{HOMEDIR}, then portable_getpwuid()[7], then... foo? portable_getpwuid() doesn't seem to do anything useful under Windows for this purpose and shouldn't be needed anyway. It just returns 'unknown' for the name, which works when you don't care about the actual user name. The first two steps are fine, and probably enough, except that you would not have to add the '\' separator, it is already in HOMEDIR. Question: In ArchiveIterator.pm does everything work if that is what it uses for HOME or does anything have to be done to convert \ to / ? -- sidney
Re: svn commit: r106600 - /spamassassin/trunk/t/SATest.pm
Malte S. Stretz wrote: oops :) But I'm glad you didn't notice my HOMEx debugging glitch :) I did, but I understood what it was for :-) I spoke too soon about it working. When I add a -w to the perl command it barfs in catpath, I think because it expects to be passed all three arguments, volume, dir, and file. I'll try adding a third argument of '' and see what it does. Or I could try reading the doc on catpath first :) -- sidney
Re: MIT spam conference
Daniel Quinlan wrote: [EMAIL PROTECTED] (Justin Mason) writes: CFP ends in 4 days though. If the trend in conference quality continues Oh, then I _do_ have time to design, research, write, and propose a paper for it! -- sidney :-)
Re: Can anyone here write some plain English?
Loren Wilton wrote: Doesn't the free VC install include nmake? The normal one does. No, that's the problem. No nmake, no winsock.h, necessitating two more big downloads in addition to the free toolkit. The DDK also includes Nmake, and a considerably newer version than what Well, I guess that can be mentioned as yet another alternative if it is more practical to order a CD than to download a few hundred megabytes. At some point it may be too much information for a readme file and more appropriate for a page in the wiki. -- sidney
Re: Idea: New way to train Bayes
Any comments? Interest in co-authoring a research paper (*poke*, I might have some ideas about it... especially if it could be related to classification of cancer cells based on microarray gene expression data :-) Now I have something to think about on my ferry commute this morning. -- sidney p.s. Finish your thesis first :-) signature.asc Description: OpenPGP digital signature
Re: Idea: New way to train Bayes
Nick Leverton said that papers he has seen found that learn on error always works better than learn everything. But I recall one that looked more carefully at longer term results and found that learn on error degrades over time. They found it best to retrain on fresh data every few months. (I don't have the reference handy). That makes sense if you consider that spam (and possibly ham) patterns change over time, even more so to the degree that spam patterns are actively adapting to try to beat spam filters. BTW, at least one spam learning filter I've seen reduces its memory requirements by using a small hash size (like 32 bits) for representing tokens. Such systems will show poorer results for learn everything compared to learn on error simply because of collision effects once they learn too many tokens. What I haven't seen discussed is the effect of token expiration as is done SpamAssassin. Wouldn't that produce he same effect as periodic retraining, thereby allowing learn on everything to work well? Doesn't that prevent the problems of converging to a mean and slowing down the learning? How does the effect of token expiration compare to the use of back-propagation? -- sidney signature.asc Description: OpenPGP digital signature
Re: YOU ARE ON THE WAY TO DESTRUCTION
Daniel Quinlan wrote: Bugzilla says we can release 3.0.2 so I therefore propose we release 3.0.2. +1! -- sidney http://www.sidney.com signature.asc Description: OpenPGP digital signature
Re: buildbot failure in [...]
Justin Mason wrote: (b) however the -parker- and -sidney- ones *are* getting annoying. ;) I suggest we turn off those slaves until we can figure out how to get buildbot to work with dynamic-IP slaves... I'm running three slaves on one machine, two of them on the same VMWare virtual machine and one running native. Most of the time they do not generate errors. I have a static ip. The problem cannot be that buildbot doesn't work with such a configuration, or else it would never work. I wonder if svn has trouble with all the clients trying to run at the same time on the same physical machine. -- sidney
Re: buildbot failure in [...]
Justin Mason wrote: Sidney, have you tried setting --keepalive=300 I'll try that. What Michael says does make sense. I'm behind a NAT. Is there a way of setting a port that the slave listens on? I can configure my NAT to let the slaves be designated servers on some port if I can make it a fixed port and assign a different port number to each of them. I'm sure if it is possible I could find it by RTFM, but I have not had a lot of time to learn about buildbot and twistd. By the way I have to call twistd directly instead of buildbot in order to get everything to work in Cygwin and Win32. They need the -n option in order to run, and in Win32 I have to give it the -r win32, which I would have expected to be automatic when running a win32 buildbot. Cygwin command: twistd -l - -n -f ../buildbot.tap Win32 command: twistd -l - -n -r win32 -f ..\buildbot.tap -- sidney
Re: buildbot failure in [...]
Justin Mason wrote: might be worth signing up to buildbot-devel (it's very low traffic) and mention that... I'm going away on holiday soon for a couple of weeks. I'll look at that after I come back. There may be some issues to work out if I'm going to test with their latest cvs version and that's not what our server is running, and I won't have time for it before I leave. I did find where to stick the --keepalive option. Buildbot doesn't take it, so I hardcoded it at the end of the mktap command line that is put together in runner.py. Cygwin and Win32 are running with it now. I'll restart the Fedora Core 3 one as soon as I finish a system update I'm doing on that machine right now. -- sidney
Make test failure in SPF test
I'm seeing the following in make test in the spf test. It doesn't show in the buildbot test because they skip SPF. (As an aside, why do they skip it?) $ t/spf.t 1..2 # Running under perl version 5.008005 for cygwin # Current time local: Sun Dec 19 09:49:57 2004 # Current time GMT: Sat Dec 18 20:49:57 2004 # Using Test.pm version 1.25 /usr/bin/perl -T -w ../spamassassin -C log/test_rules_copy --siteconfig path log/localrules.tmp -p log/test_default.cf -t data/nice/spf1 Checking helo_pass Not found: helo_pass = SPF_HELO_PASS not ok 1 # Failed test 1 in t/SATest.pm at line 549 Checking pass Not found: pass = SPF_PASS not ok 2 # Failed test 2 in t/SATest.pm at line 549 fail #2 -- sidney http://www.sidney.com signature.asc Description: OpenPGP digital signature
Re: buildbot failure in [...]
I had a power glitch here which rebooted the server. I think it happened in the middle of the svn update causing all three slave jobs to fail, and I think that it was a power glitch that caused the reboot. I'm not going to bother to bring the buildbot slaves online again before I leave on Holiday. The keepalive may have kept them going ok before the power failure, but it was too short a timespan to be sure. -- sidney
Re: Unsuscribe?
William Holman wrote: I've been over-ruled by those who pay the bills, so I can't use SpamAssassin since it's open source What bills? -- It's open source! :-) If you look at the SpamAssassin wiki you can find a list of products that are based on SpamAssassin that your billpayers can feel happy paying for while not getting access to all the source code. That way they can continue to be blissfully ignorant suckers and you can still use SpamAssassin. (I don't mean to reflect on the commercial products, only on the apparent attitude of your bill payers). http://wiki.apache.org/spamassassin/CommercialProducts How do I unsubscribe from the lists? If you view the headers of any email on these mailing lists you will see a header like this one on this list: list-unsubscribe: mailto:[EMAIL PROTECTED] Send an email to that address from the address that you want to unsubscribe and it will be done. Subject and body text are ignored. -- sidney http://www.sidney.com/
Re: Target Milestone of Future is harmful
Justin Mason said: It's a manageability thing. Another way to look at is that the only person you can be certain has an interest in a new bug report is the one who submitted it. If the target milestone is still Future and you feel strongly about it, then it is up to you to evangelize the bug report until a developer is convinced to change the milestone. Submitting a patch is one of the most effective ways to do that. I think that a bug remaining with target Future is more symptom than cause. It allows bug reports that appear not to be important to address right now to not get lost while still being mostly ignored. Anyone who cares about a specific bug report should speak up and make their case for it. -- Sidney Markowitz http://www.sidney.com
Re: SURBL whitelist volume chicken-egg problem
Robert Menschel said: Don't drop the 125, but simply add to the whitelist a number of the new top 100 I like that idea as a second choice. If the list is only updated when there is a new release of SpamAssassin then it will not grow too rapidly. It would be quite a few years to get to a table of a thousand entries. There would be a minor problem with a whitelisted domain expiring and getting snapped up by a spammer. That could be taken care of by the SURBL people checking if a domain that is being added to the SURBL is on the whitelist and informing the SA team so it can be removed. But it is only my second choice, if there is no way for SURBL to monitor domains in email independent of the SA queries. I like the idea of them getting feeds from ISPs like the one sonic.net offered. That way they can maintain a current list of most common domains in ham mail independent of the SpamAssassin release cycle. SpamAssassin could download the list more or less often depending on how volatile the list is. My guess is that monthly is fine, as that is much better than once per SA release cycle. Sidney Markowitz http://www.sidney.com
Re: SURBL whitelist volume chicken-egg problem
Jeff Chan said: There are a number of reasons for not doing a whitelist RBL: 1. Excessive queries: Whitehat domains come up a lot in messages. I was thinking along the lines of something that SpamAssassin downloads once a month, or queries to find out if there is an update available and only downloads if there is. Since the idea is to limit DNS queries, of course it would not be implemented as a DNS-based whitelist that is checked for every URI. It could be stored on a DNS if you could trust people not to misuse it, but it must be designed for infrequent downloads in bulk, with queries of URIs done to a local database. 2. Potential misuse: Inadvertently blacklisting whitehats, i.e. user error. If it is separate enough from the blacklist, i.e., it is queried and used in a totally different way than a DNS query of each URI domain, then I don't see much potential for misuse. You simply have a list of the top n non-spam domains that can be downloaded in bulk and document how to do it and that it is to be used to reduce the number of DNS queries. 3. Possibility of negative scoring: Some application would probably try to negative score them SpamAssassin would not do it. You would not encourage that. Your documentation would make it clear that it is a list of domains not to bother DNS querying that do not indicate either spam or ham when they appear in an email. Even if some misguided programmer missed all that, I don't see how it would be in a mainstream popular antispam program with enough use to effect spammers' behavior. Sidney Markowitz http://sidney.com
Re: SURBL whitelist volume chicken-egg problem
Daryl C. W. O'Shea said: The emails generated could be used to calculate the domains most often seen. I would be afraid of it being too easy for malicious people to hack by sending in false data, DoS attacks on the email addresses, etc. Also there is no reason to load down some email address with data from everyone who is running SpamAssassin. Feeds from a few large ISPs would be accurate enough for the purpose and more trustworthy. Sidney Markowitx http://www.sidney.com
spamassassin svn server appears to be down
Is anyone else seeing problems accessing the SpamAssassin svn? I can't connect to the server using svn from my machine and http://svn.apache.org/viewcvs.cgi/spamassassin/trunk/?root=Apache-SVN does not respond either. Ping works. -- sidney
Re: [Bug 4124] New: New spamassassin script doesn't work due to tainting
Malte S. Stretz wrote: Ok, I added some code for this in r153131. Could you please test it (just do a 'make clean; make'), especially on Windows? Ok, my Windows machine is working and the disk is mostly restored now, and I found the right thread to report this... Malte, the current makefile is broken for Windows in one place. Line 1152 of Makefile.PL has ifeq $(INSTALLDIRS) site INSTALLSCRIPTREALLY = $(INSTALLSITEBIN) else INSTALLSCRIPTREALLY = $(INSTALLSCRIPT) endif In Windows we use nmake instead of GNU make, which has different conditional syntax: !IF $(INSTALLDIRS) == site INSTALLSCRIPTREALLY = $(INSTALLSITEBIN) !ELSE INSTALLSCRIPTREALLY = $(INSTALLSCRIPT) !ENDIF Everything else seems fine when I change that in the generated makefile. I'll leave it up to you to decide the cleanest way to either conditionalize that for Windows or eliminate the need for a preprocessor conditional. -- sidney
Re: [Bug 4124] New: New spamassassin script doesn't work due to tainting
Daniel Quinlan wrote: We support nmake? That's the Microsoft nmake, not to be confused with any other make program of the same name. It's what is available on Windows. For compatibility we have to put all the fancy logic in the perl of Makefile.PL so the resulting makefile is written to a dumbed down common denominator. I'll test out Malte's fix when he checks it in. -- sidney
Re: [Bug 4124] New: New spamassassin script doesn't work due to tainting
Malte S. Stretz wrote: Sidney, could you test r154095 on Windows please? It works. BTW, my buildbot slaves are running again so you can see immediately, e.g., http://bugzilla.spamassassin.org:8010/trunk-sidney-win32/builds/51 -- sidney
Re: [PATCH] Config file for spamc
John Madden wrote: So, I put together this patch. It causes spamc to read /etc/mail/spamassassin/spamc.conf (if it exists) John, Would you open a ticket for this on Bugzilla at http://bugzilla.spamassassin.org as an RFE (severity: enhancement) and attach your patch there using the Create a New Attachment button? That will keep it from getting lost or overlooked. I don't think the patch is complete, as it is more *nix specific than spamc itself is. There needs to be something to specify a different configuration file for Windows, VMS, etc., either by compile-time conditionalization or a preprocessor variable that can be set in the makefile used on different platforms. That said, if you post what you have in Bugzilla, anyone who wants to finish it can work on it, and anyone who wants to use it as is on their own site will be able to find it when they search the bug list for this problem. Thanks, -- Sidney Markowitz http://www.sidney.com/ signature.asc Description: OpenPGP digital signature
make test failures
t/debug.t and t/spf.t both have failures. I'm not sure how long ago they started failing as the failures are hidden by the warning-only failures in rule_names.t. Is there a way that we can distinguish between rule_names and the other failures so that we can go back to sending notification emails on buildbot failures? -- sidney
Re: make test failures
I fixed the test failure in t/debug.t checking in to r155617. The test was just missing a new dbg message tag, replacetags, so I added it to the list. I'm less sure about what is the correct thing to do for the failure in t/spf.t. In that case there is a test for SPF_HELO_FAIL in the test spam. But as far as I can tell, spamassassin.org has a ?all in its SPF record, which should mean that the result code of 'neutral' for the helo test is correct. Should it fail? Do we need a new test case that generates an SPF HELO failure? -- sidney
Re: make test failures
Justin Mason wrote: According to the SPF people, we shouldn't be using -all on a domain that may possible emit mail. So I changed the record... That can't be right. Try out the wizard at http://spf.pobox.com/wizard.html?mydomain=spamassassin.org It gives you two choices in the last question Do the above lines describe all the hosts that send mail from spamassassin.org. If the answer is yes, you get ~all in the record, if it is no you get ?all. If you can list all sending domains, sending ip addresses, and ISP mail servers that are allowed to send mail from a spamassassin.org address, then you can use ~all and we can use from spamassassin.org in the SPF test for a failed HELO. If you can't list all of them in the record, we are forced to use ?all and we need a different domain to use for the test. I don't want to make the change to enable the test for Windows until we have the test fixed to not fail. -- sidney signature.asc Description: OpenPGP digital signature
Re: make test failures
Justin Mason wrote: According to the SPF people, we shouldn't be using -all on a domain that may possible emit mail Even if, as I think, ~all is correct if you can enumerate all legal senders for the domain, there still is a problem with making our test depend on the current configuration of something that is being used for some other purpose. There is always the risk that there will be a reason for changing the configuration. I got an idea from the tests in Mail::SPF::Query. How about if you define a spf-test.spamassassin.org domain with an SPF record with ~all. Then you are guaranteed that it will generate a fail but it can't mess up any real email. -- sidney
Re: svn commit: r156102 - in spamassassin/trunk: lib/Mail/SpamAssassin/Plugin/Razor2.pm rules/50_scores.cf
Shelby, This mailing list is for developer discussions. Developers consist of the people who have commit access to our source control system, SVN. As per Apache Foundation policies, the development process is transparent. That means that the technical and design discussions we developers have and all other parts of our decision process are held in public view. Suggestions from the public and our responses to them are in this same public forum. In the end, we make the decisions, following a documented consensus-like process in which any developer has veto power and only developers have a vote. The process would not work if every message from any one of us is followed by a disagreeing comment from a non-developer. At some point we need to be able to have our discussion and not to rehash those aspects of the discussion on which we as developers agree. We certainly should not have to deal with basic questions about the terminology we use every day in our discussions such as the acronym for our source control system, SVN. You have made your points about your software. I believe that we have stated clearly that we are not interested in hearing more about it until you have some code and results that we can look at and test and that you are ready to offer in a form that is compatible with our license. This mailing list is for developer discussions. I could try to explain what that means, but I'm afraid that you may not have the awareness of personal or social boundaries to be able to use the explanation. I'll put it in quantitative terms: 50% of the last 24 messages in my mailbox for this list are from you. If you can keep the proportion of emails from you to this list down to an amount typical of any other single non-developer, then you will have some assurance that you are not making inappropriate posts. Please do not reply with a rebuttal to this email. Or even an apology if you are so inclined. The fact that I think that your recent posts to this list are off topic for the list is not debatable, it is my opinion. It is shared by others of us who are responsible for keeping this project together. We need to keep the noise level down and get back to having developer discussions and writing code. We'll see you again if and when you have some code to share. Thank you, -- sidney
Re: svn commit: r156102 - in spamassassin/trunk: lib/Mail/SpamAssassin/Plugin/Razor2.pm rules/50_scores.cf
Daniel Quinlan wrote: aspects of the AL 2.0 don't really translate to services, but use does and that's my main concern with Razor2. I find Theo's argument that use of the razor server is always free to a user of a free SA distribution compelling. Code being free but charging for service is in the best tradition of Free and of Open Source software. Redhat's up2date is open source code (GPL?), using it to access their server possibly costs money. Email client software can be free while the account on the mailhost it talks to costs money. If the razor services are free to anyone who has not paid for the client, that is even more liberal than most service-based systems. It also means that anyone to whom we distribute SpamAssassin can use the razor servers for free, which seems compatible not just with the letter but also the spirit of the Apache License. -- sidney
Re: bug squash next week?
I vote +0.5 for Fri Mar 11. I'm voting for that date because it is a weekend here on the other side of the world, which is the only time I can do anything. I'm only voting 0.5 because I probably still won't have much time, even on a weekend :-(. -- sidney
Re: header modification
Frederik Eaton wrote: Is it possible to configure spamassassin to get back the original functionality of only modifying headers of spam 1. Look up the doc on rewrite_header and report_safe in man Mail::SpamAssassin::Conf or other documentation 2. Any further questions about this or similar topics should be directed to the SpamAssassin users mailing list, not to here. This list is for developer discussions only. Don't even reply to this with an apology or a thank you. I'll pretend that you have replied politely and leave it at that :-) -- sidney
Re: svn commit: r156102 - in spamassassin/trunk: lib/Mail/SpamAssassin/Plugin/Razor2.pm rules/50_scores.cf
Daryl C. W. O'Shea wrote: Shouldn't people evaluate whether or not they are eligible to use Razor2 before downloading (and installing) the razor-agents from Vipul's website? That was the substance of the reply I tried to write last night but was too sleepy to finish. I thought about how I never configure razor in my test installations and wondered how that happened when I was pretty much taking defaults if we supposedly support razor out of the box. I realized it's because you don't get razor unless you explicitly install Razor2 module from CPAN. So we do not distribute SpamAssassin configured to run Razor. We distribute it configured to use Razor if the Razor2 module is installed on the machine. Installing Razor2 is what gets someone involved with the license to use the service. As a result of that, I am now +1 on having the line to include razor being in init.pre and I am an agnostic +0 on whether it is commented out. As far as I can see it makes no difference if enabling the razor plugin requires only installing Razor2, or if it requires installing Razor2 and also uncommenting a line in init.pre. -- sidney signature.asc Description: OpenPGP digital signature
Re: svn commit: r156102 - in spamassassin/trunk: lib/Mail/SpamAssassin/Plugin/Razor2.pm rules/50_scores.cf
Duncan Findlay wrote: That's arguably a bug in the operating system then I don't think it is even that, but I agree with you that it is not our place to work around it. Consider this: Razor is free to use if the client software is free. The client module may come freely with the OS. The client plugin is freely available from SpamAssassin. The only way it costs money to use it from SpamAssassin is when somebody packages SpamAssassin with something else as a commercial product and sells it. (Is that true? Does a large ISP who uses SpamAssassin have to pay to enable razor on their high volume site?) Someone who sells such a commercial package is responsible for the configuration that they ship it with. At that point we are not talking about the default configuration of the free core distribution of SpamAssassin. So I don't see the need from a licensing point of view of disabling razor in init.pre. And I'm still +0 on commenting it out anyway. At some point I guess it will be time to stop discussing this if the votes are all that nobody is -1 on commenting out the line in init.pre and Daniel is strongly +1 on commenting it out. -- sidney
Daniel and SpamAssassin are on Slashdot!
Daniel and SpamAssassin are on Slashdot! http://it.slashdot.org/article.pl?sid=05/03/04/2010218tid=111 -- sidney
Re: svn commit: r156102 - in spamassassin/trunk: lib/Mail/SpamAssassin/Plugin/Razor2.pm rules/50_scores.cf
Shelby Moore wrote: Sidney Markowitz wrote: This mailing list is for developer discussions. I could try to explain what that means, but I'm afraid that you may not have the awareness of personal or social boundaries to be able to use the explanation. There you go again trying to ERRONEOUSLY inpune my character. Again? I stand by the politeness of the one other message I posted in reply to your original proposal. I apologize if you consider my statement about personal boundaries an insult. To me it is a cultural/personality difference that is not worth trying to work around, hence my appeal to numbers instead of trying to convince you that there was anything one might find annoying in your posts. This mailing list is for developer discussions. Developers consist of the people who have commit access to our source control system, SVN. No where is that stated in public: The only description of this mailing list, the one linked to by the Lists link on the spamassassin.org home page, says that explicitly: http://wiki.spamassassin.org/MailingLists#head-f67bc6dad74f08d4d8b6187fc92476b5a2aa4a2b Unless you are looking for a definition of developers. The one I provided (all committers) is not stated explicitly anywhere, but no reasonable definition of developer would include more than the larger list of contributors at http://svn.apache.org/repos/asf/spamassassin/trunk/CREDITS In any case this is way off topic for a mailing list for developer discussions. This is not a developer discussion. I will take your advice about ignoring mail from you to avoid further off topic discussion. -- sidney
Re: header modification
Frederik Eaton wrote: As developers, you might want to add that information to the part of the man page I quoted I assume that you are referring to the released version of SpamAssassin. Looking at out latest development version I see that the wording has already been changed to make that clearer in the next release. -- sidney
Re: header modification
Frederik Eaton wrote: Also, with all due respect, you really didn't have to be such an asshole Reading my words quoted back to me, I agree. The question as you asked it was more appropriate for the users list. My response to that effect was posted to the list because people reading this list should see that before they post. The Don't even reply... part was intended to be in a lighthearted tone, and I see now that it does not come across as I intended. For that I apologize. Your post pointing out the misleading documentation once you had the correct information was completely on topic for this list. I thank you for pointing out the error even if it was one that had been found and corrected in our development tree. -- sidney
Re: Fw: Spam - Internet gaming industry, Gaming Transac
Thanks for your interest in helping improve things, but please read http://wiki.spamassassin.org/DoYouWantMySpam for the FAQ about not sending spam samples to our mailing lists. -- sidney
Re: client SMTP authorization
Tony Finch wrote: Is anyone planning to implement CSA for SpamAssassin? I'm not, but I do have a question about it. Is it something that would best be implemented on the MTA to reject fake SMTP servers, or does it have a maybe case which would be best handled by a SpamAssassin rule without outright rejecting the mail? -- sidney
Re: Proposal: 3.0.3 release schedule
Duncan Findlay wrote: That's a pretty significant change for a maintenance release. Yes, and I mention it to bring it to his attention. I guess it's up to him to decide whether or not to back port the patch, and then it is up to us whether to accept it in an official 3.0.3 release, just like it is up to us whether there is any official 3.0.3 release, and it is up to the Fedora crew what they want to go into their FC4 distro. I do think that with most of the change being encapsulated in a new object and with the old code being definitely wrong, they might decide that it is worth fixing all of those bugs with one change. Personally, I'm the reckless type who would try to get 3.1 into FC 4. Lucky for them I'm not involved with that :-) -- sidney
Re: svn commit: r164278 - /spamassassin/trunk/t/uri.t
[EMAIL PROTECTED] wrote: Added testcase from Bug4191 This test fails on my Fedora Core 3 system with svn trunk even though bug 4191 has a comment that says that it is fixed in 3.1. t/uri...FAILED test 77 Failed 1/76 tests, 98.68% okay -- sidney
Re: uridnsbl: bogus rr run ...
Theo Van Dinter wrote: I have ~300K of them. http://www.kluge.net/~felicity/set1.txt This should not be happening anymore since the patch for bug #4260 was committed to trunk. Are you still getting them? The warning was only there to help us track down that problem. If we are sure that the problem has been fixed I'm also +1 on removing it. It would be nice to know if it happens, but if the problem has been fixed it is just some extra code that will never be run. -- sidney
Re: uridnsbl: bogus rr run ...
Theo Van Dinter wrote: The output is from my Saturday weekly net run. It looks like 4260 was committed as r161778, the nightly run was r164362. Yuck, this looks like you are still getting DNS records in the wrong order. Look at that first log entry. It says that a query for usafreemerchantsource.com.multi.surbl.org is getting the response that is correct for query for dns7.hichina.com. That's the problem that the warning is supposed to help us catch. That should no longer be possible given that there is a unique ID associated with each query and it is supposed to match the ID in the response. This is serious. It certainly proves the worth of having the warning in there. -- sidney
Re: uridnsbl: bogus rr run ...
Theo Van Dinter wrote: It looks like 4260 was committed as r161778, the nightly run was r164362. Do people think we should reopen 4260? This could happen if the random ID isn't random enough or 16 bits isn't large enough to avoid collisions. I don't see how that would happen if different processes choose different ports to listen on, as there should be no way then for queries to collide across processes and with the ID being incremented each time there should be no collision within the same process. If somehow the IDs are colliding, the fix would be to include some information from the question along with the 16 bit ID to prevent that. I have a small patch that will do that, but I would like to see it used in a test to find out if it has anything to do with the problem before proposing to use it for real. Theo, would you be willing to run a mass test with this to see if it helps? $ svn diff lib/Mail/SpamAssassin/DnsResolver.pm Index: lib/Mail/SpamAssassin/DnsResolver.pm === --- lib/Mail/SpamAssassin/DnsResolver.pm(revision 164463) +++ lib/Mail/SpamAssassin/DnsResolver.pm(working copy) @@ -45,7 +45,7 @@ use Mail::SpamAssassin::Logger; use IO::Socket::INET; - +use Digest::SHA1 qw(sha1_base64); our @ISA = qw(); # a counter value to use for DNS ID numbers in new_dns_packet(). @@ -243,8 +243,8 @@ return if $self-{no_resolver}; my $pkt = $self-new_dns_packet($host, $type, $class); - - my $id = $pkt-header-id; + $host =~ s/\.$//; + my $id = substr(sha1_base64($host . $pkt-header-id), -8); my $data = $pkt-data; my $dest = $self-{dest}; if (!$self-{sock}-send ($pkt-data, 0, $self-{dest})) { @@ -291,8 +291,11 @@ defined $packet-answer) { my $header = $packet-header; -my $id = $header-id; - +my @questions = $packet-question; +my $ques = $questions[0]; +my $host = $ques-qname; +my $nid = $header-id; +my $id = substr(sha1_base64($host . $nid), -8); # dbg(dns: reply id=$id); my $cb = delete $self-{id_to_callback}-{$id};
Re: Question about dnsbl.t test
Daniel Quinlan wrote: It's not used in the t test itself. Thanks, that helps. I suspect that whatever is causing the hang in bug 4278 has a symptom of a DNS query failing without hanging when it doesn't hang. Now that I know that the $bind variable has nothing to do with it I can track that down. -- sidney
Re: uridnsbl: bogus rr run ...
Loren Wilton wrote: How about a simple debug printout of the id value sent and the id value received? Maybe it is as simple as the id matching code is failing. That's definitely a better idea considering that there is a bug in the patch I posted that prevents any of the DNS stuff from working :-). On the other hand, it does look like the id matching code is working and it is difficult to see just from looking at tons of debug logs if IDs are getting reused across processes and getting mixed up through use of the same port. I'll see if I can get the sha1 version working better in case Theo is inclined to try it to see what it does. -- sidney
Re: uridnsbl: bogus rr run ...
This is the corrected patch that ensures that IDs are not colliding by including the host name in an SHA1 hash with the 16 bit ID counter. It is written a bit crudely, but if Theo or someone else who is seeing the problem would try this in a mass test it would demonstrate whether the problem has anything to do with this: -- sidney Index: lib/Mail/SpamAssassin/Dns.pm === --- lib/Mail/SpamAssassin/Dns.pm(revision 164570) +++ lib/Mail/SpamAssassin/Dns.pm(working copy) @@ -22,6 +22,7 @@ use Mail::SpamAssassin::Conf; use Mail::SpamAssassin::PerMsgStatus; use Mail::SpamAssassin::Constants qw(:ip); +use Digest::SHA1 qw(sha1_base64); use File::Spec; use IO::Socket; use IPC::Open2; @@ -145,7 +146,10 @@ return $self-{resolver}-bgsend($host, $type, undef, sub { my $pkt = shift; - $self-{dnsfinished}-{$pkt-header-id} = $pkt; + my $h = $host; + $h =~ s/\.$//; + my $id = substr(sha1_base64($h . $pkt-header-id), -8); + $self-{dnsfinished}-{$id} = $pkt; }); } Index: lib/Mail/SpamAssassin/DnsResolver.pm === --- lib/Mail/SpamAssassin/DnsResolver.pm(revision 164570) +++ lib/Mail/SpamAssassin/DnsResolver.pm(working copy) @@ -45,7 +45,7 @@ use Mail::SpamAssassin::Logger; use IO::Socket::INET; - +use Digest::SHA1 qw(sha1_base64); our @ISA = qw(); # a counter value to use for DNS ID numbers in new_dns_packet(). @@ -243,8 +243,8 @@ return if $self-{no_resolver}; my $pkt = $self-new_dns_packet($host, $type, $class); - - my $id = $pkt-header-id; + $host =~ s/\.$//; + my $id = substr(sha1_base64($host . $pkt-header-id), -8); my $data = $pkt-data; my $dest = $self-{dest}; if (!$self-{sock}-send ($pkt-data, 0, $self-{dest})) { @@ -291,8 +291,11 @@ defined $packet-answer) { my $header = $packet-header; -my $id = $header-id; - +my @questions = $packet-question; +my $ques = $questions[0]; +my $host = $ques-qname; +my $nid = $header-id; +my $id = substr(sha1_base64($host . $nid), -8); # dbg(dns: reply id=$id); my $cb = delete $self-{id_to_callback}-{$id};
Re: uridnsbl: bogus rr run ...
Matt Sergeant wrote: May be a problem with forking. Here's part of the fork replacement I use in my code that uses the single-packet-DNS stuff: Justin's code generates a number from the pid to initialize the ID counter and keeps track of it itself instead of relying on the Net::DNS code. Are there some systems in which fork does not result in a new pid? Is it the case that the socket created in each process would use a different source port on the local host? I don't see how there can be so many collisions without both the pid and the source port being the same. -- sidney
Re: uridnsbl: bogus rr run ...
Theo Van Dinter wrote: The patch does make things *much* slower though, around 3x: [...] Without the patch, lots of issues starting after 80%. I don't claim that the patch is the most efficient way of dealing with it... I just wanted to use SHA1 to ensure that there was no chance of an ID collision. I think we have now verified that ID collision is the likely proximate cause of the problem. Still, can three SHA1 calculations compare to the time it takes for a DNS query? I don't see how the computation would slow things down by a factor of three. Perhaps what you are seeing is the difference in wall clock time between processing a reply to an old packet when it arrives right away vs rejecting those packets and waiting for the actual reply. If that's what's happening you are not going to get the faster time when everything works, as it is the nameserver's response time that is slowing down the run. The faster time is perhaps just a symptom of the bug? There is still the question of where the collisions are coming from. Here's another idea -- Instead of using substr(sha1_base64($host . $id), -7) use something that combines the pid and id into a six byte string in the three places in the code where sha1 is used. That will be faster than using SHA1, which will let you know if the slowdown is due to computation or waiting for good packets to arrive, and it will let you know if the problem is with different processes using the same source port for sending the UDP queries. If it is the latter, we may be able to avoid the collisions if we are better about picking the source ports. Are you up for some more few-hour tests? :-) -- sidney
Re: uridnsbl: bogus rr run ...
Sidney Markowitz wrote: use something that combines the pid and id Brain fade... This patch works by matching information that is in the reply packet to information in the query packet, which means it has to use the host name and the packet ID. Duh! Sorry. Still, we could try some debug log output to determine if the different processes are using the same source ports. I don't see how we could have collisions in the ID unless the source ports are the same. If that's it, we would not have to use the host name to ensure that the reply matches the query if we had a way of making the source ports different across processes. Could you run with debug output that shows the pid, packet ID and source port for the packets that are created in DnsResolver in a run that demonstrates the bug? -- sidney
Re: uridnsbl: bogus rr run ...
I haven't been running mass-checks until now, but I just tried it with svn trunk and got a couple of bogus rr warnings so far between the 50% and 60% marks so far. It's taken two and a half hours to get that far, so this is a very slow process. I just shut down the vmware session that was running the Windows and Cygwin botslaves on that machine, and I hope that speeds things up a bit. It's a 1Ghz Athlon machine running Fedora core 3. In any case, it looks like I'll be able to run my own painfully slow tests to try things out. -- sidney
Re: uridnsbl: bogus rr run ...
Matt Sergeant wrote: I didn't think you could do that because in newer versions of Net::DNS the id is a lexical variable. The only way to reinitialise it is to reload the module. If I remember it correctly, Justin's code keeps its own counter and sets the packet ID after creating the packet, making it independent of Net:DNS's counter. I guess that would break down if there are any uses of Net::DNS by the same process that do not go through his code. If that is what is happening and it results in ID collision, the fix would be to use code like yours to reload the module and rely on its own counter. I'll try that now that I can reproduce the problem myself (painful as it is). -- sidney
Re: uridnsbl: bogus rr run ...
Sidney Markowitz wrote: I guess that would break down if there are any uses of Net::DNS by the same process that do not go through his code grep doesn't find any other use of Net::DNS :-( I just got another 10 bogus rr hits between the 60% and 70% marks on my mass test run. I wonder what it could mean that it happens more towards the end of a run that takes so long. Could nameservers take on the order of minutes or a half hour to send back a UDP reply to a query? At least now I know that the problem is reproducible here. Even though I can't figure out why reloading Net::DNS should make difference, I'll try it just in case. -- sidney
Re: uridnsbl: bogus rr run ...
Matt Sergeant wrote: May be a problem with forking Do you think that this code fragment I see in SpamAssassin.pm should work as well as your fork code, or could relying on this be part of the problem? sub init { my ($self, $use_user_pref) = @_; # Allow init() to be called multiple times, but only run once. if (defined $self-{_initted}) { # If the PID changes, reseed the PRNG and the DNS ID counter if ($self-{_initted} != $$) { $self-{_initted} = $$; srand; $self-{resolver}-reinit_post_fork(); } return; } # Note that this PID has run init() $self-{_initted} = $$; -- sidney
Re: uridnsbl: bogus rr run ...
Theo Van Dinter wrote: I'm trying a small patch which basically calls the reinit function when the counter wraps to 0, as well as using rand when initializing. This way it'll get a new random starting point and a new socket occasionally. I think I understand the problem now. It's similar to what you said. I noticed when debugging t/dnsbl.t that the one message in it generates 52 DNS queries. When there are tens of thousands of messages in a mass check and -j=4, there are going to be several wraparounds of the 16 bit ID. Apparently nameserver responses can arrive quite late. My guess about the slowdown you saw when using the sha1 patch is that while it avoided errors from collisions, all the old reply packets were still read and hashed before being discarded. A fix would be to close and reopen the socket at each message, or as you suggested when the counter wraps. But it should not be when the counter wraps to zero, it should be when it wraps to its initial value. I think it would be better to create the new socket with each message. If old replies are arriving as they seem to, wouldn't it be more efficient to not have a listener on the socket when they arrive? -- sidney
Re: uridnsbl: bogus rr run ...
Sidney Markowitz wrote: I think it would be better to create the new socket with each message. If old replies are arriving as they seem to, wouldn't it be more efficient to not have a listener on the socket when they arrive? I got confused when I reread this, so I thought I should clarify it. If the socket is not changed with each message, then a process sends out queries on a port with one message, then continues to send out queries on the same port in subsequent messages. The ID is incremented and the socket is changed when the ID wraps, so there are no collisions. However, packets still arrive in reply to queries sent for old messages. Until the ID wraps, the replies are received, the ID is not found in the pending list, and the packet is discarded. There is no collision problem, but processing may take a lot longer than if a new socket is created for each message causing the old replies to find no listener on their port. -- sidney
Re: uridnsbl: bogus rr run ...
I'm going to respond to yours and John Gardiner Myers replies in the bug 4260 discussion to keep everything tracked there now that I've re-opened the bug. -- sidney
Re: uridnsbl: bogus rr run ...
Loren Wilton wrote: Depending on the value of the parameter that Perl is deducing from that statement, you may or may not be getting the results you expect. From the doc: srand Sets the random number seed for the rand operator. If EXPR is omitted, uses a semi-random value based on the current time and process ID, among other things. Since this call is only done when the process id is different (i.e., in a fork) then srand with no arguments is correct for initializing rand for the process. -- sidney
Re: Moving on to 3.0.4
Warren Togami wrote: Why bother pushing another tarball just for a single patch that affects only one distribution? If I understand the preceding discussion correctly this is not a matter of release early, release often carried to an extreme. It is an abort of the release process for 3.0.3 after the version number has been frozen into a tarball but before it has been announced. The idea is to abort the release in order to accommodate another patch that would have gone into 3.0.3 if there had been a 24 hour waiting period for final votes to be collected. Michael, is that a correct assessment of the situation? -- sidney
failed test
I just saw this in a make test in Win32 that I am running right now. I'm posting this to sa-dev because I have to go to sleep before the make test finishes and so cannot see if it dies the same in Cygwin or elsewhere, and I can't look at it right now: t\meta..'..' is not recognized as an internal or external command, operable program or batch file. parse-rules-for-masses failed! at t\meta.t line 42. tmp/rules.pl is unparseable: Can't locate log/rules-0.pl in @INC (@INC contains: ../blib/lib D:\sasvn\trunk\blib\lib D:\sasvn\trunk\blib\arch C:/Perl/lib C:/Perl/site/lib . C:/Perl/lib C:/Perl/site/lib .) at t\meta.t line 45. t\meta..dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 1-2 Failed 2/2 tests, 0.00% okay
Another failed test in Win42
Sleep and kids don't always go together. Here's the other test that failed in Win32, posted here in case anyone can do anything with it. It works in Cygwin. After I post, I _will_ sleep... t\bayessdbm.ok 48/52# Failed test 49 in t\bayessdbm.t at \ line 262 t\bayessdbm.NOK 49# Failed test 50 in t\bayessdbm.t at line 263 t\bayessdbm.NOK 50# Failed test 51 in t\bayessdbm.t at line 264 t\bayessdbm.NOK 51# Failed test 52 in t\bayessdbm.t at line 265 t\bayessdbm.FAILED tests 49-52 Failed 4/52 tests, 92.31% okay
Re: SpamAssassin 3.0.3 Released
Sidney Markowitz wrote: The correct fix for 3.0 branch, assuming that spf.t there is still testing a DNS record over which we have no control Hmm, I looked. It doesn't. I'm downloading 3.0 branch now to see what is wrong. -- sidney
Re: SpamAssassin 3.0.3 Released
Now I remember what happened. We weren't using something like aol.com we were using spamassassin.org, our real spf record. We changed it to do more of the right thing and broke the test that counted on its old value. I'll look into it more to see if there is a way to make it the test work the way it is without breaking the mail. -- sidney
Re: svn commit: r168050 - /spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm
I just did a little experiment. I placed an entry for the ip address of one of my web servers in /etc/hosts (or rather the Windows equivalent of it on my PC) with host name www_host.exam_ple.com. I emailed myself a message containing the text http://www_host.exam_ple.com When I looked at the message in Thunderbird the URL was a hot link. Clicking on it opened my browser looking at the site at that ip address. Even if there are no host names in the SURBL right now with _ in them, if SpamAssassin skips over those, just like Justin said spammers will start using them. -- sidney
Re: boosting
Frederik Eaton wrote: How are the rule weights for spamassassin generated? There is a method called boosting The rule weights are generated using a single-layer perceptron, as described in the wiki link that Daniel mentioned. I'm writing a paper this semester [I hope :-)] looking at the applicability of the simple methods used by SpamAssassin to some classification problems in microarray gene expression data. I expect to look at boosting along with that, along the lines of Jackson, J. and Craven, M., Learning Sparse Perceptrons, Advances in Neural Information Processing Systems 8 (Conference Proceedings of NIPS*95), 1996 http://www.mathcs.duq.edu/~jackson/bbp.pdf So far I don't think that it has been tried. If anyone has looked at it it would have been Henry Stern, who came up with the perceptron for SpamAssassin rule scoring. -- sidney
Re: boosting
Fred wrote: There was similar work being done in the past to identify rules to be grouped into new meta rules, this (w|c)ould achieve similar results. http://bugzilla.spamassassin.org/show_bug.cgi?id=1363 I think I'm missing something here. Are you saying that automatically grouping rules into meta rules that have similar classification properties is equivalent to boosting? Or do you mean that it is another approach that also can improve performance of weak learners? In any case, you have given me an idea for the microarray gene expression problem, so thanks! :-) -- sidney
Re: svn commit: r169047 - in /spamassassin/trunk: masses/corpora/mass-find-nonspam sa-learn.raw tools/speedtest
Theo Van Dinter wrote: -1 Don't use M::SA unless its necessary (no reason to load a bajillion things). Just use M::SA::Message. I see that Mail::SpamAssassin-parse just calls Mail::SpamAssassin::Message-new and returns the Message object. Is this the correct syntax to use then instead of the call to parse() ? my $ma = Mail::SpamAssassin::Message-new({message=$dataref}); If that's it I'll make the change. -- sidney
Question about a proposed change
Does anyone have any objection to my checking in the following change? It makes the code in Dns.pm independent of the format of the key that is used to check the reply packets so that it will be easier to play with using different keys such as hashes by changing only code in DnsResolver.pm. -- sidney -- Index: lib/Mail/SpamAssassin/Dns.pm === --- lib/Mail/SpamAssassin/Dns.pm(revision 169513) +++ lib/Mail/SpamAssassin/Dns.pm(working copy) @@ -145,7 +145,8 @@ return $self-{resolver}-bgsend($host, $type, undef, sub { my $pkt = shift; - $self-{dnsfinished}-{$pkt-header-id} = $pkt; + my $id = shift; + $self-{dnsfinished}-{$id} = $pkt; }); } Index: lib/Mail/SpamAssassin/DnsResolver.pm === --- lib/Mail/SpamAssassin/DnsResolver.pm(revision 169513) +++ lib/Mail/SpamAssassin/DnsResolver.pm(working copy) @@ -296,7 +296,7 @@ return 0; } -$cb-($packet); +$cb-($packet, $id); return 1; } else { -- signature.asc Description: OpenPGP digital signature
Re: Question about a proposed change
Justin Mason wrote: looks fine to me -- however there are other calls to that bgsend() method elsewhere. it may need to be made there too. Good point. I forgot to grep to make sure I wan't missing anything. Make test didn't show problems, but it wouldn't until I actually tried to change from using the packet id to using something else. Grep found sub res_bgsend in URIDNSBL.pm that needed the same two line change as in Dns.pm. There were no others. There is a call to bgsend in sub search in DnsResolver.pm, but it doesn't use the id to verify the replies and so doesn't need any changes. Once everything else is working I would like to change that to make it more robust. I'll wait a while to give everyone else a chance to respond across time zones before I check it in. -- sidney
Re: Humorous to me ...
It's funny except I'm getting one of those challenge messages for each one I send to this list. I don't want to give in to that crap by responding to register my email address with a stranger. I guess that's what blacklist-from is for. I wonder if that service has the ability to whitelist-to a mailing list address and if that person is just being clueless? This is the second mailing list I've encountered this on in the past two weeks, BTW. I hope it isn't a trend. -- sidney
Build broken?
Is the build broken or is it something I screwed up locally? mimheader.t and uri_html.t are breaking when I run them: $ t/mimeheader.t 1..2 # Running under perl version 5.008006 for cygwin # Current time local: Wed May 11 23:12:38 2005 # Current time GMT: Wed May 11 11:12:38 2005 # Using Test.pm version 1.25 /usr/bin/perl -T -w ../spamassassin -C log/test_rules_copy --siteconfigpath log/localrules.tmp -p log/tst.cf - L -t data/nice/004 [2300] warn: config: invalid regexp for rule MIMEHEADER_TEST1: /(?-xism:application/msword)/: Search pattern not termina ted [2300] info: config: SpamAssassin failed to parse line, MIMEHEADER_TEST1 content-type =~ /application/msword/ is not v alid for mimeheader, skipping: mimeheader MIMEHEADER_TEST1 content-type =~ /application/msword/ [2300] warn: config: invalid regexp for rule MIMEHEADER_TEST2: /(?-xism:(?i)APPLICATION/MSWORD)/: Unmatched ( in regex; marked by -- HERE in m/-xism:( -- HERE / [2300] info: config: SpamAssassin failed to parse line, MIMEHEADER_TEST2 content-type =~ m!APPLICATION/MSWORD!i is not valid for mimeheader, skipping: mimeheader MIMEHEADER_TEST2 content-type =~ m!APPLICATION/MSWORD!i Checking test1 Not found: test1 = MIMEHEADER_TEST1 not ok 1 # Failed test 1 in t/SATest.pm at line 575 Checking test2 Not found: test2 = MIMEHEADER_TEST2 not ok 2 # Failed test 2 in t/SATest.pm at line 575 fail #2 - $ t/uri_html.t 1..2 # Running under perl version 5.008006 for cygwin # Current time local: Wed May 11 23:12:56 2005 # Current time GMT: Wed May 11 11:12:56 2005 # Using Test.pm version 1.25 did not find http://neverp4yretail.com/bam/[?]man=mic49 not ok 1 # Failed test 1 in t/uri_html.t at line 52 ok 2
Re: svn commit: r169596 - /spamassassin/trunk/lib/Mail/SpamAssassin/Conf/Parser.pm
Justin Mason wrote: Are there tests in the test suite for the redirector usage case btw? Excuse me if I'm misunderstanding the question in my fog-before-first-coffee of the morning... The redirector patterns are hardcoded in sub try_canon in uri.t so any change to them in 20_uri.cf has to be copied there. The redirector patterns in 20_uri.cf are tested by one case in uri_html.t which does not appear in uri_text.t. Once it is working, we should probably add a case for each pattern and have them be in bot uri_html.t and uri_text.t. -- sidney signature.asc Description: OpenPGP digital signature
Re: t/dnsbl.t failing
Theo Van Dinter wrote: I don't know if this is a known issue, but it seems like tests 1-18 fail (of 22) for t/dnsbl.t ... From what I can see, most of the lookups timeout at 15s which blows the tests out of the water. I've been seeing that with varying regularity depending on which network I'm on. It's pretty consistently bad on my home DSL, but works if I run it again immediately after. I guess then the queries are cached somewhere. Could it be that bugzilla just isn't a good machine to be using as the spamassassin.org nameserver as it's too slow to respond? -- sidney
Re: t/dnsbl.t failing
I got some debugging output and it looks like something is quite wrong, but I don't have time to look at it right now. Maybe tonight or tomorrow if nobody else catches it first. -- sidney signature.asc Description: OpenPGP digital signature
Re: Weekly net run, still has issues
Theo, Could you try running with the bogus rr for domain warn statement in URIDNSBL modified to output $packet and $ent-{id} instead of $packet-header-id? That will make the warning message a bit more verbose, but you aren't seeing that many of them anyway, and it will provide helpfule debug information. -- sidney
Re: weekly net run bugs
Justin Mason said: mystery solved ;) Aww, I was looking forward to tracking down a really mysterious bug :) -- sidney
Buildbot question
Does anyone know if we should be able to use the latest version of Buildbot, 0.6.5 with buildbot.spamassassin.org? I know that I could just try it, but I don't want to spend time trying to get it to work only to find that the master has to be upgraded first. -- sidney
Re: Buildbot question
Justin Mason said: yep, should be possible -- create a t/config file that enables it in the buildbot slave's checkout ;) I thought that's under the control of the master. Doesn't the script recreate the entire trunk every time? Oh, of course that would be too expensive. Ok, I'll edit the t/config file. Right now I'm having a problem getting buildbot 0.62 running under Cygwin after it upgraded python to 2.4. It dies with a permission denied error trying to switchuid. I haven't decided whether to try to track down the problem or take a chance on upgrading to buildbot 0.65 first to see what happens. -- sidney
Re: Buildbot question
Justin Mason said: afaik you can. Ok, I'll try it. First I'll confirm that I can get the 0.6.2 that I have installed running again, as I've had it down for a while. Another question -- Can we have a way of enabling network test for the buildbot runs? I can see how it should be an option, as some people might not want to load their network every time they run an automated test, but I don't mind, and the network stuff should be tested too. -- sidney