Okay, here is a small contribution to the list. Markus, this script:
grep "Total weight =" m:\imail\spool\spam\log\dec0628.log | gawk "{print $2,
$NF}" > log0628.txt
will output a file called log0628.txt in the following space delimited
format (snip):
16:35:17 64
16:35:29 78
16:35:39 0
16:36:10 1
16:36:35 69
16:36:39 -13
16:36:50 90
16:36:51 37
16:36:55 74
As Markus noted, the UNIX utilities needed for to run these scripts can be
found at: http://unxutils.sourceforge.net/ There is no installation, just
simply extract the files contained in the zip file into a directory and
you're all set.
Here are a couple of additional scripts to get you thinking about the power
of these utilities, which hopefully people will share with the list as they
develop their own scripts. The following script will list all of your
Declude tests and show how many messages were flagged by the test:
egrep "Message OK|Msg failed" m:\imail\spool\spam\log\dec0615.log | gawk
"{print $6}" | sort | uniq -c | sort -rn
This will output a report like the following, in less than 30 seconds (if
any of you have run some of the other JunkMail log reporting tools, you will
find this quite extraordinary in comparison to the hours it takes to run
reports with these other reporting tools):
9870 SPAMCHECK
8827 NOLEGITCONTENT
8082 IPNOTINMX
7728 SM-SPAM-L1
7466 SM-SPAM-L2
7154 SPAMSNIFFER
6793 WEIGHT36->
6541 SM-SPAM-L3
5749 REYNOLDS
5698 HEADERS-FILTER
5058 EASYNET-DNSBL
4867 SM-SPAM-L4
3932 SUBJECT-FILTER
3762 BODY-FILTER
3610 OSSRC
2973 SPAMHAUS
2902 OK
2827 SPAMCOP
2759 NJABL
2605 OSSOFT
2497 SM-SPAM-L5
2480 INTERSIL
1807 NOMOREFUNN
1486 VOX
1420 BLARSBL
1300 FIVETEN-SRC
1290 MAILFROM-FILTER
1203 NOABUSE
1188 NOPOSTMASTER
1077 HELO-FILTER
1070 REVDNS
1010 DSBL
952 SORBS
919 EASYNET-PROXIES
783 DSN
726 MONKEYPROXIES
689 BADHEADERS
680 HEURISTICS
680 HELOBOGUS
651 WEIGHT16-35
642 REVDNS-FILTER
422 SPAMBAG
416 BLITZEDALL
397 SPAMDOMAINS
391 LONGSUBJECT
356 ROUTING
306 OSPROXY
306 FIVETEN-OPTIN
300 COMMENTS
294 IPWHOIS
267 SUBJECTSPACES
247 UCEB
228 SM-ADULT-L1
221 SM-ADULT-L2
217 SM-ADULT-L3
210 BASE64
182 SM-ADULT-L4
178 LEADMON
149 SM-ADULT-L5
140 MAILFROM
114 BH-CHINA
97 FABEL
71 KOREA-NETS
71 KITHRUP
71 BH-KOREA
68 BONDEDSENDER
62 EASYNET-DYNA
55 DSBL-MULTI
54 SPAMHEADERS
53 PIGS
52 OSRELAY
51 ORDB
44 BH-JAPAN
34 OSDIPS
32 BH-ARGENTINA
29 BH-RUSSIA
27 BH-BRAZIL
18 BH-TAIWAN
18 BH-HONGKONG
16 KUNDENSERVER
14 BH-THAILAND
10 DNSRBL-DUN
8 EXSILIA-SPAM
7 FIVETEN-MULTI
4 NONENGLISH
3 REMOTEIP-FILTER
3 BH-MALAYSIA
1 OSLIST
1 BH-SINGAPORE
The following script will allow you to view the subject line of all messages
flagged by whatever test you define in the script (in this case I used
"SORBS"), and will sort them by count:
egrep "Msg failed SORBS|Subject:" m:\imail\spool\spam\log\dec0617.log |
grep -A 1 SORBS | grep Subject | cut -b 39- | sort -f | uniq -ic | sort -rfn
The output looks like (snip):
10 Subject: You want a bigger one?
9 Subject: Is your manhood too small?
9 Subject: CheapTrips Airfares: Best Price Guaranteed
8 Subject: prevent stretch marks during pregnancy
8 Subject: Baby Boomers to GenX dhj k
8 Subject: ##Low Income Funding Program vyig
8 Subject: ##Low Income Funding Program h ymuviwtx uggldu
7 Subject: View Photos Of Sexy Singles In Your Area
7 Subject: SUCCESS... dizaa
7 Subject: rsvp-feel better guaranteed
7 Subject: Earn $500 a Week Easily !
6 Subject: Increase your Penis by 2 to 5 full inches in Weeks.
6 Subject: Earn $2000 Weekly Easily!
5 Subject: good news - accelerates recovery from athletic injury
5 Subject: Bargain Shoes
5 Subject: >#Government Loan Program### ryb o q
These scripts have to run all on one line, with no carriage returns, in
order to work properly. Also, you will need to run these scripts from the
directory that you have extracted the UNIX utilities to. This is because
some of the files have the same name as Windows utilities, like "sort" for
example.
Speaking of "sort", which is used is a couple of these scripts, there
appears to be about a 2mb size limitation on the content you are trying to
sort. It will only be an issue if you log files are around 25mb or larger,
since the script is trying to sort on the output of the first grep command.
I have sent an e-mail to the developer asking him about this size
limitation, since there appears to be no size limitation on our Linux
machines, where I can run the same script on any size log file.
Have fun!
Bill
----- Original Message -----
From: "Markus Gufler" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, June 28, 2003 1:17 AM
Subject: RE: [Declude.JunkMail] time-dependently hold weight
>
>
> > I've considered this a few times, every time I prepare to
> > suggest it I remember what happened with my idea to test for
> > long subjects, there just isn't enough uniformity.
>
> Well. Maybe my idea is expressed from "the wrong side".
> Watching the diagram I can also simply fathom that my current hold
> weight is a little bit too low.
> After adding some new SpamChk tests (we are currently testing) and some
> new RBL-lists, the average value has increased a little bit. So the only
> thing I have to do is to increase slightly the hold weight (or decrease
> the points for every single test)
>
> Remains the fact, that only 13% of our FP's whas recieved out of
> business time. If there is some way to detect the senders local current
> time or timezone this for sure will help again to reduce false positives
> or false negatives using a "time-dependently hold weight"
>
>
> > BTW, the graph is amazing, how is it made?
>
> Hmmm, it's not an "out of the box" tool, but maybe someone can develop
> it. I think it should be very easy but at the moment I'm not familiar
> with any RAD tool...
>
> So here the steps what I've done:
>
> 1.) grep all lines from the declude logfile containing "Total weight ="
> Grep.exe is part of the unixtools what you can find on
> http://unxutils.sourceforge.net/
> Don't fear to "install" this tools. You can also simply extract the
> zip-archive.
>
> C:\imail\spool\grep -U "Total weight =" dec0624.log >
> c:\imail\spool\tw0624.log
>
> This will create a new file tw0624.log in the spool folder containing
> only the lines with the total weight of any message processed by declude
> junkmail.
>
> Note: You need at least loglevel MID to see the "Total weight" lines in
> the logfile.
>
> 2.) Now I've "elaborated" my tw-file
> In the following original line
> 06/21/2003 00:01:42 Q843b181400780c01 HELOBOGUS:19 . Total weight = 19
>
> a.) delete the date "06/21/2003 "
> 00:01:42 Q843b181400780c01 HELOBOGUS:19 . Total weight = 19
>
> b.) replace the " Q" after the time with ";"
> 00:01:42;843b181400780c01 HELOBOGUS:19 . Total weight = 19
>
> c.) replace the "Total weight = " with ";"
> 00:01:42;843b181400780c01 HELOBOGUS:19 . ;19
>
> 3.) Now you have a CSV file with the time in the first and the weight in
> the third column.
> You can import this for example into MS Excel
>
> 4.) To "decode" the HH:MM:SS time format in something usable for a
> diagramm I've used the following formula:
> C1 = (HOUR(A1)*3600)+(MINUTE(A1)*60)+SECONDS(A1)
>
> This will give you in cell C1 the timecode in seconds
>
> 5.) Now you can play around with different diagrams, ...
> For example you can also sort all rows by the weight to create a graph
> like them attached to this message.
> This will show you if you have done a good job configuring the tests so
> that in the critical zone between 80 and 120% of your hold weight there
> are minimal messages. (high slope)
>
> I know looks like a lot of work, but it's done in few minutes and will
> give you a great view what's going on on your junkmail filter.
>
> All of this steps can be automizzed, if someone has time and knowledge
> to create a small reporting tool...
>
> Markus
>
>
>
>
---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]
---
This E-mail came from the Declude.JunkMail mailing list. To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type "unsubscribe Declude.JunkMail". The archives can be found
at http://www.mail-archive.com.