[Bug 3828] [review] spamd parent stops accepting requests

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3828





--- Additional Comments From [EMAIL PROTECTED]  2004-11-18 16:23 ---
Good catch, Sebastian!  POSIX::sigaction will, indeed, solve two problems:

- 1. deal with internal timeouts using SIGALRM overwriting our one

- 2. always use unsafe signal handlers, which are essential in this case to
  interrupt all possible hangs, including regexp complexity ones.

however Sys::SigAction does also note that in perl versions prior to 5.8,
POSIX::sigaction does not work correctly.  So I agree with Sidney, we need to
implement code to do the same thing using %SIG, in our codebase.

in the latter case, #1 will be unavoidable.  but that seems to have worked for
Dallas anyway -- I would surmise because the rules that set alarms (namely
DCC/Pyzor/Razor) are happening late enough that the error condition this
catches, has already happened by that stage if it was going to happen.

I think we need a new patch that does the same thing as Dallas' patch, but using
functions that will switch between use of %SIG and POSIX::sigaction depending on
$^V.  btw, any chance someone with a CLA on file could do this?




--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


[Bug 3917] spamd under Cygwin causes SpamC to report failed sanity check on some messages

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3917





--- Additional Comments From [EMAIL PROTECTED]  2004-11-18 16:42 ---
Narrowing it down just a little bit more: I modified the server in the last
attachment to cache the first message it receives and then keep using that. I
then called it using spamc to send one large message followed by repeatedly
sending a 0 length message. The server kept sending back to spamc the original
large message.

This produces the same intermittent error, truncating in the same place.

That indicates that the sending of the large message to spamd is not a factor in
the truncation of the message that spamd sends out.

My next step in simplifying the test will be to have the server respond to a
client connection by sending back fixed large message, and use something simpler
than spamc as the client to test it.



--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


proposal: an automated rule-qa system

2004-11-19 Thread Justin Mason
So, we were discussing the rules situation -- ie. that we've been pretty
crap at getting rules into the distro. I proposed this, and I think we're
reasonably into the idea as a way to help out.

We add a web-app somewhere that periodically scrapes bugzilla
for bugs on the rules component which contain some token from trusted
users indicating that they contain rules that need testing.

That then extracts rules from attachments/text on that bug, and

- (a) checks out SVN trunk
- (a) adds them to the rules dir of that in a temporary file
- (b) runs a mass-check on those rules
- (c) does simple lint using spamassassin --lint and
  lint-rules-from-freqs
- (d) does some kind of basic S/O testing
- (e) it may be that we can also check in the rules into SVN for a full
  nightly mass-check from all the people doing those, in which case it
  should come up with the results from that, nicely snipped out of the
  full reports.
- (f) if we do (e), we can even get the results, segmented by the age of
  the corpus used!  in other words, give us a picture of the freqs based
  on how old the messages it was hitting on were.
- (g) -- possibly -- do a quick perceptron run to evaluate if the rule
  overlaps with other rules too much.

Finally, it'll display the results at a given URL -- probably based on the
bug and comment numbers, so it's easily hyperlinkable.

Using bugzilla as the backend is useful, btw, as that gives us

  - threaded discussion of rules
  - contributor CLA status tracking
  - good ways to get lists and overviews of what contributions are
available and their status
  - gatewayed to mailing list, and viewable via www

Sound useful?  That should at least take some legwork out of rule QA,
and stop us committers being a bottleneck in the process.

--j.


[Bug 3813] RFE: Better packaging for a Windows installable version

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3813

[EMAIL PROTECTED] changed:

   What|Removed |Added

 CC||[EMAIL PROTECTED]





--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


[Bug 3983] adopt Apache preforking algorithm

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3983





--- Additional Comments From [EMAIL PROTECTED]  2004-11-18 19:35 ---
Created an attachment (id=2527)
 -- (http://bugzilla.spamassassin.org/attachment.cgi?id=2527action=view)
implementation

here's the impl.   I'll check this into trunk shortly unless anyone gives me a
-1.  ;)



--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


[Bug 3967] [review] large numbers of redirectors can cause slowness in a few rules

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3967





--- Additional Comments From [EMAIL PROTECTED]  2004-11-18 21:42 ---
Subject: Re:  [review] DOS E-Mail Message

 +1 on those two rule fixes

+1 ditto

Do we want to add a protective rule for long URLs like we did for long
headers?





--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


[Bug 3771] PostgreSQL Specific Bayes Storage Module

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3771





--- Additional Comments From [EMAIL PROTECTED]  2004-11-18 22:36 ---
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On 11/18/2004 3:38 PM, Michael Parker wrote:
 On Thu, Nov 18, 2004 at 06:53:19AM -0800, Rupa Schomaker wrote:
 
Some questions:

Is bytea really necessary?  If I follow the path of the patch, the bytea
change was done prior to adding the index.  Since the tokens are binary
data it is probably more correct through, especially if one has a
encoding other than SQL_ASCII set for the DB...
 
 
 Yes, as far as I can tell from the documentation.  The fact that we're
 storing the binary value makes it necessary.  If I'm misinformed, then
 feel free to point out where in the documentation.

My understanding is that isn't necessary but it is more fragile (subject
to the database encoding and the client encoding).  This was discussed
recently on one of the postgres groups... Looking:

http://groups.google.com/groups?hl=enlr=selm=cndnbc%24otp%241%40FreeBSD.csie.NCTU.edu.tw
Message-ID: [EMAIL PROTECTED]

===
From: Tom Lane ([EMAIL PROTECTED])
Subject: Re: [ADMIN] evil characters #bfef cause dump failure
Date: 2004-11-16 12:19:06 PST

[snip]
BTW, SQL_ASCII is not so much an encoding as the absence of any encoding
choice; it just passes 8-bit data with no interpretation.  So it's not
*that* unreasonable a default.  You can store UTF8 data in it without
any problem, you just won't have the niceties like detection of bad
character sequences.

   regards, tom lane
===

Leave it as bytea...

What do you use to benchmark changes?  I'm willing to experiment but
would like to have some reproducable results for ya...
 
 
 It's not really ready for real world consumption and time has been
 short for getting it ready.  You can read a little about it here:
 http://wiki.apache.org/spamassassin/BayesBenchmark
 
 Hopefully, I'll get some free time soon and get it into the SA tree.

I'll take a look at it when I get a chance.

Some more testing/observations with sa-learn only.  BTW: do you want me
to move this discussion to the ticket in bugzilla?  Or we can wait 'till
I/we have a summary...

General notes:

1) Why not a unique index that mimics the primary key (though do it in
token,id order not id,token)?  Won't matter in my case (since I run as
one user) and probably doen't matter at all unless running with lots 'n
lots of users...

2) bayes_seen.msgid should be type 'text' -- sa-learn (and others) don't
truncate to 200.

3) I also get differences in the backup file.

- -rw-r--r--  1 rupa users 13047214 Nov 18 13:23 backup_dbm.txt
- -rw-r--r--  1 rupa users 13047202 Nov 18 17:16 backup_new.txt

An actual diff is probably meaningless since I doubt order is guaranteed
between a dbm and sql.  I did the diff and quickly gave up.  I suppose
the data could be ordered from both sources and then compared?

Some 'benchmarks' of sa-learn.  Single run:

bayes_seen: 202863 rows
bayes_token: 150842 rows

System is:
model name  : AMD Athlon(tm) XP 2600+
MemTotal:  1031916 kB
debian unstable

With a fairly large workload from a memory standpoint but CPU generally
fairly idle.

Postgres hasn't been tuned much -- have to reset the stats in postgres
and do some analysis...

1) Shipped config with msgid='text' on my backup file:

real24m35.663s

2) Shipped config with indices added:

real32m33.931s

Ekk!  Analyze; delete; rerun:

Still 30min.

hrmmm..

But I know it runs better in normal operation.  Oh well *shrug* must be
the index update even though the check constraint doesn't need a table scan.

3) Patch (2004-10-31 18:53) applied, re-create tables:

real14m29.793s

Analyze, delete, rerun:

15m.

A bit better.

BTW: Using dbm the full restore takes 23s...

Time to add some small amount of stats to sa-learn (or underlying) to
see where we're spending time...  Added some more timing points and
dbg() output to SQL.pm.  Needs Time::HiRes which is bundled in perl
5.8.x but is an optional add-on for earlier stuff.

Ok, with my large set:

Token inserts start at around 1-2s per 1000 and rises to 7-8s per 1000.

Seen inserts start at around 1s per 1000 and stay there.

I can think of ways to optimize sa-learn (do it all in one TX rather
than 1TX per insert), assume an insert rather than using the generic
query then insert path for _put_token() but the restore is only done
once anyway and the changes would require some invasive changes rather
than just re-using existing logic  Not worth it.

It is however a reasonable test of the insert/update logic of learning a
single message (whether auto-learn or manual).  Doesn't test the query
side though...

 
 Michael

- --
 -Rupa

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBnYS/L3Aub+krmycRAuioAJ9bh224fxsAvUTX9liLQ1pf/wYIVACgxBDQ
SllANDuelO8OWEwqOWZ9FsM=
=1cIx
-END PGP SIGNATURE-




--- You are receiving 

[Bug 3771] PostgreSQL Specific Bayes Storage Module

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3771





--- Additional Comments From [EMAIL PROTECTED]  2004-11-18 22:37 ---
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Oh, forgot something.  The patch doesn't create an index on
bayes_seen(msgid) -- probably should.

- --
 -Rupa

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBnYYfL3Aub+krmycRAu6YAKCY2gDfJyqm6Fq3F4I0+u0ruFhI4gCePYyd
Dj6IuC9ax2E2gWYx3DwTln0=
=WvDb
-END PGP SIGNATURE-




--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


[Bug 3771] PostgreSQL Specific Bayes Storage Module

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3771





--- Additional Comments From [EMAIL PROTECTED]  2004-11-18 22:48 ---
Subject: Re:  PostgreSQL Specific Bayes Storage Module

On Thu, Nov 18, 2004 at 10:36:45PM -0800, [EMAIL PROTECTED] wrote:
  
 Some questions:
 
 Is bytea really necessary?  If I follow the path of the patch, the bytea
 change was done prior to adding the index.  Since the tokens are binary
 data it is probably more correct through, especially if one has a
 encoding other than SQL_ASCII set for the DB...
  
  
  Yes, as far as I can tell from the documentation.  The fact that we're
  storing the binary value makes it necessary.  If I'm misinformed, then
  feel free to point out where in the documentation.
 
 My understanding is that isn't necessary but it is more fragile (subject
 to the database encoding and the client encoding).  This was discussed
 recently on one of the postgres groups... Looking:
 
 http://groups.google.com/groups?hl=enlr=selm=cndnbc%24otp%241%40FreeBSD.csie.NCTU.edu.tw
 Message-ID: [EMAIL PROTECTED]
 
 ===
 From: Tom Lane ([EMAIL PROTECTED])
 Subject: Re: [ADMIN] evil characters #bfef cause dump failure
 Date: 2004-11-16 12:19:06 PST
 
 [snip]
 BTW, SQL_ASCII is not so much an encoding as the absence of any encoding
 choice; it just passes 8-bit data with no interpretation.  So it's not
 *that* unreasonable a default.  You can store UTF8 data in it without
 any problem, you just won't have the niceties like detection of bad
 character sequences.
 
regards, tom lane
 ===
 
 Leave it as bytea...
 

Interesting, I think my main concern was the fact that BYTEA was the
only way to make sure you got any trailing whitespace (which we do
get) so it had to be used.  Like I said, I'm far from the postgresql
expert so I'm gladly proven wrong.

 1) Why not a unique index that mimics the primary key (though do it in
 token,id order not id,token)?  Won't matter in my case (since I run as
 one user) and probably doen't matter at all unless running with lots 'n
 lots of users...

Didn't realize it was necessary.

 2) bayes_seen.msgid should be type 'text' -- sa-learn (and others) don't
 truncate to 200.

We should just truncate in the code, maybe it needs to be a little
bigger but add a hard substr to the code anyway.

 3) I also get differences in the backup file.
 
 - -rw-r--r--  1 rupa users 13047214 Nov 18 13:23 backup_dbm.txt
 - -rw-r--r--  1 rupa users 13047202 Nov 18 17:16 backup_new.txt
 
 An actual diff is probably meaningless since I doubt order is guaranteed
 between a dbm and sql.  I did the diff and quickly gave up.  I suppose
 the data could be ordered from both sources and then compared?
 

This is a problem, see the bug for a short discussion.  There is for
sure some differences in output that should not be there.

 Ok, with my large set:
 
 Token inserts start at around 1-2s per 1000 and rises to 7-8s per 1000.
 
 Seen inserts start at around 1s per 1000 and stay there.
 

I started running the auto analyzer deal to keep the statistics
up-to-date, this helps keep from trailing off later in the run.

 I can think of ways to optimize sa-learn (do it all in one TX rather
 than 1TX per insert), assume an insert rather than using the generic
 query then insert path for _put_token() but the restore is only done
 once anyway and the changes would require some invasive changes rather
 than just re-using existing logic  Not worth it.

Yeah, it would require a fairly large change all around.

Michael




--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


[Bug 3771] PostgreSQL Specific Bayes Storage Module

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3771





--- Additional Comments From [EMAIL PROTECTED]  2004-11-18 23:42 ---
On 11/18/2004 10:48 PM, [EMAIL PROTECTED] wrote:

Leave it as bytea...
 
 Interesting, I think my main concern was the fact that BYTEA was the
 only way to make sure you got any trailing whitespace (which we do
 get) so it had to be used.  Like I said, I'm far from the postgresql
 expert so I'm gladly proven wrong.

Given that we can't guarantee db encoding (Someone mentioned that RH fedora core
ships with encoding enabled) we're best off using bytea.  Ignore that I brought
this up. :) 
 
1) Why not a unique index that mimics the primary key (though do it in
token,id order not id,token)?  Won't matter in my case (since I run as
one user) and probably doen't matter at all unless running with lots 'n
lots of users...
 
 
 Didn't realize it was necessary.
 
On second pass, it isn't.  I just starting perusing the statics tables in my
system and found that there were two sets of indexes.  The ones for the forien
key and the ones I created manually.  The system created PK index is hidden (at
least n pgAdmin) -- my mistake.

In any case, the system index is built on the order of the keys -- best to swap
the keys (token,id) and (seen,id).

Given we have a unique index on these fields and in the right order we should be
ok asis.

2) bayes_seen.msgid should be type 'text' -- sa-learn (and others) don't
truncate to 200.
 
 
 We should just truncate in the code, maybe it needs to be a little
 bigger but add a hard substr to the code anyway.

For fields under 255 chars there is no penalty (or storage weirness) using text
vs varchar(200).  Postgres stores it as a 1byte length and then data and the
field is no longer than that.  If it goes over then I believe it moves the data
to the toast table -- so a slight penalty there. I think I saw 5 greater than
200chars out of 202863.  dbm obviously stores the full length.  It is mysql that
silently ignores (or  so I'm told, I can't verify).

3) I also get differences in the backup file.
[snip]

 This is a problem, see the bug for a short discussion.  There is for
 sure some differences in output that should not be there.

i did another run with debugging on and noticed that some of the seen lines got
disgarded.  That might account for the difference when stricly looking at file
sizes.
 
 I started running the auto analyzer deal to keep the statistics
 up-to-date, this helps keep from trailing off later in the run.

Ah, I'll play on the next import (one index, just the PK one).

-- 
 -Rupa





--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


[Bug 3771] PostgreSQL Specific Bayes Storage Module

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3771





--- Additional Comments From [EMAIL PROTECTED]  2004-11-19 00:03 ---
Spoke too soon.  Relying on the PK index in either order resulted in seqscans in
all cases.  Very weird -- not gonna track that down.  Created non-unique indices
on token and msgid and all is back to where it was...  Analyze didn't 'fix' the
explain plan in any perceptible way.



--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


RE: proposal: an automated rule-qa system

2004-11-19 Thread Chris Santerre


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 18, 2004 9:16 PM
To: dev@spamassassin.apache.org
Subject: proposal: an automated rule-qa system

*SNIP*

We add a web-app somewhere that periodically scrapes bugzilla
for bugs on the rules component which contain some token from trusted
users indicating that they contain rules that need testing.

That then extracts rules from attachments/text on that bug, and

*SNIP*
Sound useful?  That should at least take some legwork out of rule QA,
and stop us committers being a bottleneck in the process.

+1   ;)

Although the ninjas have been really slow to find new rules, as spam is
getting caught so well now. 

--Chris


RE: proposal: an automated rule-qa system

2004-11-19 Thread [EMAIL PROTECTED]
I think this is a great idea..  It may also be useful, if this is going to
be automated to have buckets for the rules based on the output of mass
checks.

Something like agressive rules, netrual rules, and lenient rules based on
their catching of spam/ham.  

That way someone grabbing them that may not have full knowledge of how
everything works does not turn around and say I downloaded X rule from SVN
and it caught Y Ham, why?

Ron

Original Message:
-
From: Chris Santerre [EMAIL PROTECTED]
Date: Fri, 19 Nov 2004 08:57:51 -0500
To: dev@spamassassin.apache.org
Subject: RE: proposal: an automated rule-qa system




-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 18, 2004 9:16 PM
To: dev@spamassassin.apache.org
Subject: proposal: an automated rule-qa system

*SNIP*

We add a web-app somewhere that periodically scrapes bugzilla
for bugs on the rules component which contain some token from trusted
users indicating that they contain rules that need testing.

That then extracts rules from attachments/text on that bug, and

*SNIP*
Sound useful?  That should at least take some legwork out of rule QA,
and stop us committers being a bottleneck in the process.

+1   ;)

Although the ninjas have been really slow to find new rules, as spam is
getting caught so well now. 

--Chris


mail2web - Check your email from the web at
http://mail2web.com/ .




[Bug 3984] New: Use of uninitialized value in pattern match

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3984

   Summary: Use of uninitialized value in pattern match
   Product: Spamassassin
   Version: SVN Trunk (Latest Devel Version)
  Platform: Other
OS/Version: other
Status: NEW
  Severity: normal
  Priority: P5
 Component: spamassassin
AssignedTo: dev@spamassassin.apache.org
ReportedBy: [EMAIL PROTECTED]


sa-learn --spam --mbox --showdots  spam.txt

got this...

Use of uninitialized value in pattern match (m//) at
/Library/Perl/5.8.5/Mail/SpamAssassin/Message.pm line 230.



--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


[Bug 3975] Can not do make test

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3975





--- Additional Comments From [EMAIL PROTECTED]  2004-11-19 10:59 ---
Looking at the code, I see that the tests use port 48373.  I suppose that an
unprivileged port was needed so that the tests could be run by normal users.
The port is defined in file t/SATest.pm.  So either check for something using
port 48373 or change the test port to something else and try again.
In any case, this is probably not a bug in SpamAssassin, but rather something
somewhat unusual in your system.  You might have better luck asking about this
on the users list.



--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


[Bug 3828] [review] spamd parent stops accepting requests

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3828





--- Additional Comments From [EMAIL PROTECTED]  2004-11-19 10:56 ---
here is another one today...

child pid 9638 running for  5 min, 29 sec
 - ALERT: spamd pid 9638 ran for 5 min, 29 sec and was killed!

i have that script running a kill -9 just in case i cant get to the system at
the time.   hopefully i can catch one today and kill -15 it instead.

2004-11-19 12:09:32.391200500 debug: [9638] rules: running header regexp tests;
score so far=8.273
2004-11-19 12:09:32.391202500 debug: [9638] auto-whitelist: sql-based connected
to DBI:mysql:logs:localhost:3306
2004-11-19 12:09:32.724190500  unfinished ...
2004-11-19 12:09:32.724220500 [pid 11363] ... read resumed 0x401f8000, 4096) =
? ERESTARTSYS (To be restarted)
2004-11-19 12:09:32.724240500 [pid 11363] --- SIGALRM (Alarm clock) ---

again here, the last line was the sql connection for AWL.  half a second later
it pulls a sucessful AWL, so who knows

2004-11-19 12:09:32.806266500 debug: [11363] auto-whitelist: sql-based connected
to DBI:mysql:logs:localhost:3306
2004-11-19 12:09:32.807102500 debug: [11363] auto-whitelist: sql-based
get_addr_entry: found existing entry for [EMAIL PROTECTED]|ip=8.6
2004-11-19 12:09:32.807149500 debug: [11363] auto-whitelist: sql-based
[EMAIL PROTECTED]|ip=8.6 scores 33/240.266
2004-11-19 12:09:32.807316500 debug: [11363] auto-whitelist: AWL active,
pre-score: 8.273, autolearn score: 8.273, mean: 7.28078787878788, IP: 
8.6.241.121
2004-11-19 12:09:32.808017500 debug: [11363] auto-whitelist: sql-based
add_score: new count: 34, new totscore: 248.539 for [EMAIL PROTECTED]|ip=8.6
2004-11-19 12:09:32.808254500 debug: [11363] auto-whitelist: sql-based finish:
disconnected from DBI:mysql:logs:localhost:3306
2004-11-19 12:09:32.808405500 debug: [11363] auto-whitelist: post auto-whitelist
score: 7.77689393939394

this is twice i have caught a spamd child that has hung where the last debug
line it sent was the auto-whitelist sql connection.   hopefully i'll see another
one soon.



--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


[Bug 3828] [review] spamd parent stops accepting requests

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3828





--- Additional Comments From [EMAIL PROTECTED]  2004-11-19 11:31 ---
Subject: Re:  [review] spamd parent stops accepting requests

On Fri, Nov 19, 2004 at 10:56:40AM -0800, [EMAIL PROTECTED] wrote:
 
 this is twice i have caught a spamd child that has hung where the last debug
 line it sent was the auto-whitelist sql connection.   hopefully i'll see 
 another
 one soon.
 

I'm trying to remember, was there any indiciation (besides these
latest two) that AWL was involved?

Could be some sort of deadlock in MySQL.  Is there ever a case when
one spamd has hung that another takes a slightly longer (but not long
enough to trigger the timeout/killer script) amount of time that
usual?

Michael




--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


[Bug 3828] [review] spamd parent stops accepting requests

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3828





--- Additional Comments From [EMAIL PROTECTED]  2004-11-19 11:53 ---
the last 2 hangs were on awl.. the others i'm not exactly sure because i didnt
have full debugging enabled.   i've been running an strace -f with full debug
for the last 3 days and have seen the 3 hangs.  the one from yesterday i jacked
up the log file... i think the last line on it was something about bayes, but
since i lost some of that data, i didnt want to trust that read.

the weird thing is awl processed fine half a second after that child hung. 
there is not much in common here... the time that they hang is around 12pm
(11:39 and 12:09 so far), but nothing else apparent runs then.   mysqld shows no
errors or problems that i can see.   

if i get 1 more that indicated awl, i'll shut that down and see if the problems
go away.  personally i dont think its awl, and i've taken a look at the code
around that dbg and dont see anything that could case it either.






--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


[Bug 3967] [review] large numbers of redirectors can cause slowness in a few rules

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3967





--- Additional Comments From [EMAIL PROTECTED]  2004-11-19 12:15 ---
Subject: Re:  [review] large numbers of redirectors can cause slowness in a few 
rules

On Thu, Nov 18, 2004 at 09:42:52PM -0800, [EMAIL PROTECTED] wrote:
 Do we want to add a protective rule for long URLs like we did for long
 headers?

I added in T_REDIRS_* to check for = a certain number of redirections.

Not great results in and of themselves, but ...

  0.001   0.0013   0.1.000   0.480.01  T_REDIRS_5
  0.001   0.0013   0.1.000   0.480.01  T_REDIRS_4
  0.000   0.   0.0.500   0.470.01  T_REDIRS_20
  0.000   0.   0.0.500   0.470.01  T_REDIRS_8
  0.000   0.   0.0.500   0.470.01  T_REDIRS_10
  0.000   0.   0.0.500   0.470.01  T_REDIRS_15
  0.003   0.0027   0.00670.285   0.470.01  T_REDIRS_3
  0.056   0.0113   0.50740.022   0.300.01  T_REDIRS_2





--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


[Bug 3984] Use of uninitialized value in pattern match

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3984





--- Additional Comments From [EMAIL PROTECTED]  2004-11-19 12:22 ---
Subject: Re:  New: Use of uninitialized value in pattern match

On Fri, Nov 19, 2004 at 10:04:08AM -0800, [EMAIL PROTECTED] wrote:
Version: SVN Trunk (Latest Devel Version)
 
 Use of uninitialized value in pattern match (m//) at
 /Library/Perl/5.8.5/Mail/SpamAssassin/Message.pm line 230.

Hrm.

if (defined $boundary  $message[0] =~ /^--\Q$boundary\E(?:--|\s*$)/) {

So that means $message[0] must be undef...  I just reproduced it, it occurs
with malformed messages when there's a header and absolutely no body or blank
line separator.

I'll commit a patch when I get a free minute.





--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


[Bug 3917] spamd under Cygwin causes SpamC to report failed sanity check on some messages

2004-11-19 Thread bugzilla-daemon
http://bugzilla.spamassassin.org/show_bug.cgi?id=3917

[EMAIL PROTECTED] changed:

   What|Removed |Added

Attachment #2513 is|0   |1
   obsolete||



--- Additional Comments From [EMAIL PROTECTED]  2004-11-19 14:51 ---
Created an attachment (id=2530)
 -- (http://bugzilla.spamassassin.org/attachment.cgi?id=2530action=view)
even simpler server that demonstrates the problem

I've attached a simpler server program that demonstrates the problem without
requiring spamc in the test and without a lot of data being sent from the
client to the server. Run this under Cygwin. It will run with -w and -T
options. It takes two optional command line arguments. The first is the ip
address to listen on, defaulting to 127.0.0.1. The second argument is the port
number, defaulting to 783.

This server waits for a client to send something on the port, then it sends
back a 68013 byte message with headers that allow it to work with spamc as the
client.  So you can test it by running it and then running spamc -x -l
repeatedly, looking at the return code and error messages from spamc. You can
also use telnet as a client. The message that the server sends back consists of
1000 identical lines prefixed with line numbers 000 to 999 followed by a line
that says END OF TEST. That makes it easy to see how the data receivfed from
teh server is trucated. Just use telnet locahost 783 and type the Enter key
when you get a connection to the server.

This test consistently for me truncates the received data after 49152 bytes.




--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


Java client to spamd

2004-11-19 Thread Kurt Humes
I am begining to build a Java Libray to act as a client to spamd, not using JNI however. Has anyone ever done something similar and if so what are the roadblocks that you have come across.

thanks.
	
		Do you Yahoo!? 
The all-new My Yahoo! – Get yours free! 
 
 
 


Re: Java client to spamd

2004-11-19 Thread Justin Mason
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Kurt Humes writes:
 I am begining to build a Java Libray to act as a client to spamd, not
 using JNI however.  Has anyone ever done something similar and if so
 what are the roadblocks that you have come across.

Kurt, I'm unaware of anything, but it should be very, very
straightforward.

(only (minor) roadblock: there was a bug in whitespace handling at the end
of the server response to one of the request verbs, can't rememmber which
one, but it's documented in spamd/PROTOCOL.)

- --j.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFBnn5xMJF5cimLx9ARArbIAKCx/cCfhv0813QtyDF6lRC0zY9p+gCfcukJ
1R7sGioj2UFAVNc7PJ1ZkiY=
=hAuU
-END PGP SIGNATURE-