Wow, I can't believe you guys, this stuff is amazing.  Now to figure out what grep is 
so I can use it!

Would something written in php be as strong/fast?

Dan


On Saturday, June 28, 2003 20:09, Bill Landry <[EMAIL PROTECTED]> wrote:
>Okay, here is a small contribution to the list.  Markus, this
>script:
>
>grep "Total weight =" m:\imail\spool\spam\log\dec0628.log | gawk "{print $2,
>$NF}" > log0628.txt
>
>will output a file called log0628.txt in the following space delimited
>format (snip):
>
>16:35:17 64
>16:35:29 78
>16:35:39 0
>16:36:10 1
>16:36:35 69
>16:36:39 -13
>16:36:50 90
>16:36:51 37
>16:36:55 74
>
>As Markus noted, the UNIX utilities needed for to run these scripts can be
>found at: http://unxutils.sourceforge.net/  There is no installation, just
>simply extract the files contained in the zip file into a directory and
>you're all set.
>
>Here are a couple of additional scripts to get you thinking about the power
>of these utilities, which hopefully people will share with the list as they
>develop their own scripts.  The following script will list all of your
>Declude tests and show how many messages were flagged by the
>test:
>
>egrep "Message OK|Msg failed" m:\imail\spool\spam\log\dec0615.log | gawk
>"{print $6}" | sort | uniq -c | sort -rn
>
>This will output a report like the following, in less than 30 seconds (if
>any of you have run some of the other JunkMail log reporting tools, you will
>find this quite extraordinary in comparison to the hours it takes to run
>reports with these other reporting tools):
>
>   9870 SPAMCHECK
>   8827 NOLEGITCONTENT
>   8082 IPNOTINMX
>   7728 SM-SPAM-L1
>   7466 SM-SPAM-L2
>   7154 SPAMSNIFFER
>   6793 WEIGHT36->
>   6541 SM-SPAM-L3
>   5749 REYNOLDS
>   5698 HEADERS-FILTER
>   5058 EASYNET-DNSBL
>   4867 SM-SPAM-L4
>   3932 SUBJECT-FILTER
>   3762 BODY-FILTER
>   3610 OSSRC
>   2973 SPAMHAUS
>   2902 OK
>   2827 SPAMCOP
>   2759 NJABL
>   2605 OSSOFT
>   2497 SM-SPAM-L5
>   2480 INTERSIL
>   1807 NOMOREFUNN
>   1486 VOX
>   1420 BLARSBL
>   1300 FIVETEN-SRC
>   1290 MAILFROM-FILTER
>   1203 NOABUSE
>   1188 NOPOSTMASTER
>   1077 HELO-FILTER
>   1070 REVDNS
>   1010 DSBL
>    952 SORBS
>    919 EASYNET-PROXIES
>    783 DSN
>    726 MONKEYPROXIES
>    689 BADHEADERS
>    680 HEURISTICS
>    680 HELOBOGUS
>    651 WEIGHT16-35
>    642 REVDNS-FILTER
>    422 SPAMBAG
>    416 BLITZEDALL
>    397 SPAMDOMAINS
>    391 LONGSUBJECT
>    356 ROUTING
>    306 OSPROXY
>    306 FIVETEN-OPTIN
>    300 COMMENTS
>    294 IPWHOIS
>    267 SUBJECTSPACES
>    247 UCEB
>    228 SM-ADULT-L1
>    221 SM-ADULT-L2
>    217 SM-ADULT-L3
>    210 BASE64
>    182 SM-ADULT-L4
>    178 LEADMON
>    149 SM-ADULT-L5
>    140 MAILFROM
>    114 BH-CHINA
>     97 FABEL
>     71 KOREA-NETS
>     71 KITHRUP
>     71 BH-KOREA
>     68 BONDEDSENDER
>     62 EASYNET-DYNA
>     55 DSBL-MULTI
>     54 SPAMHEADERS
>     53 PIGS
>     52 OSRELAY
>     51 ORDB
>     44 BH-JAPAN
>     34 OSDIPS
>     32 BH-ARGENTINA
>     29 BH-RUSSIA
>     27 BH-BRAZIL
>     18 BH-TAIWAN
>     18 BH-HONGKONG
>     16 KUNDENSERVER
>     14 BH-THAILAND
>     10 DNSRBL-DUN
>      8 EXSILIA-SPAM
>      7 FIVETEN-MULTI
>      4 NONENGLISH
>      3 REMOTEIP-FILTER
>      3 BH-MALAYSIA
>      1 OSLIST
>      1 BH-SINGAPORE
>
>The following script will allow you to view the subject line of all messages
>flagged by whatever test you define in the script (in this case I used
>"SORBS"), and will sort them by count:
>
>egrep "Msg failed SORBS|Subject:" m:\imail\spool\spam\log\dec0617.log |
>grep -A 1 SORBS | grep Subject | cut -b 39- | sort -f | uniq
>-ic | sort -rfn
>
>The output looks like (snip):
>
>     10 Subject: You want a bigger one?
>      9 Subject: Is your manhood too small?
>      9 Subject: CheapTrips Airfares: Best Price Guaranteed
>      8 Subject: prevent stretch marks during pregnancy
>      8 Subject: Baby Boomers to GenX dhj k
>      8 Subject: ##Low Income Funding Program vyig
>      8 Subject: ##Low Income Funding Program h ymuviwtx  uggldu
>      7 Subject: View Photos Of Sexy Singles In Your Area
>      7 Subject: SUCCESS... dizaa
>      7 Subject: rsvp-feel better guaranteed
>      7 Subject: Earn $500 a Week Easily !
>      6 Subject: Increase your Penis by 2 to 5 full inches in Weeks.
>      6 Subject: Earn $2000 Weekly Easily!
>      5 Subject: good news - accelerates recovery from athletic injury
>      5 Subject: Bargain Shoes
>      5 Subject: >#Government Loan Program### ryb o q
>
>These scripts have to run all on one line, with no carriage returns, in
>order to work properly.  Also, you will need to run these scripts from the
>directory that you have extracted the UNIX utilities to.  This is because
>some of the files have the same name as Windows utilities, like "sort" for
>example.
>
>Speaking of "sort", which is used is a couple of these scripts, there
>appears to be about a 2mb size limitation on the content you are trying to
>sort.  It will only be an issue if you log files are around 25mb or larger,
>since the script is trying to sort on the output of the first grep command.
>I have sent an e-mail to the developer asking him about this size
>limitation, since there appears to be no size limitation on our Linux
>machines, where I can run the same script on any size log file.
>
>Have fun!
>
>Bill
>
>----- Original Message ----- 
>From: "Markus Gufler" <[EMAIL PROTECTED]>
>To: <[EMAIL PROTECTED]>
>Sent: Saturday, June 28, 2003 1:17 AM
>Subject: RE: [Declude.JunkMail] time-dependently hold weight
>
>
>>
>>
>> > I've considered this a few times, every time I prepare to
>> > suggest it I remember what happened with my idea to test for
>> > long subjects, there just isn't enough uniformity.
>>
>> Well. Maybe my idea is expressed from "the wrong side".
>> Watching the diagram I can also simply fathom that my current hold
>> weight is a little bit too low.
>> After adding some new SpamChk tests (we are currently testing) and some
>> new RBL-lists, the average value has increased a little bit. So the only
>> thing I have to do is to increase slightly the hold weight (or decrease
>> the points for every single test)
>>
>> Remains the fact, that only 13% of our FP's whas recieved out of
>> business time. If there is some way to detect the senders local current
>> time or timezone this for sure will help again to reduce false positives
>> or false negatives using a "time-dependently hold weight"
>>
>>
>> > BTW, the graph is amazing, how is it made?
>>
>> Hmmm, it's not an "out of the box" tool, but maybe someone can develop
>> it. I think it should be very easy but at the moment I'm not familiar
>> with any RAD tool...
>>
>> So here the steps what I've done:
>>
>> 1.) grep all lines from the declude logfile containing "Total weight ="
>> Grep.exe is part of the unixtools what you can find on
>> http://unxutils.sourceforge.net/
>> Don't fear to "install" this tools. You can also simply extract the
>> zip-archive.
>>
>> C:\imail\spool\grep -U "Total weight =" dec0624.log >
>> c:\imail\spool\tw0624.log
>>
>> This will create a new file tw0624.log in the spool folder containing
>> only the lines with the total weight of any message processed by declude
>> junkmail.
>>
>> Note: You need at least loglevel MID to see the "Total weight" lines in
>> the logfile.
>>
>> 2.) Now I've "elaborated" my tw-file
>> In the following original line
>> 06/21/2003 00:01:42 Q843b181400780c01 HELOBOGUS:19 .  Total weight = 19
>>
>> a.) delete the date "06/21/2003 "
>> 00:01:42 Q843b181400780c01 HELOBOGUS:19 .  Total weight = 19
>>
>> b.) replace the " Q" after the time with ";"
>> 00:01:42;843b181400780c01 HELOBOGUS:19 .  Total weight = 19
>>
>> c.) replace the "Total weight = " with ";"
>> 00:01:42;843b181400780c01 HELOBOGUS:19 .  ;19
>>
>> 3.) Now you have a CSV file with the time in the first and the weight in
>> the third column.
>> You can import this for example into MS Excel
>>
>> 4.) To "decode" the HH:MM:SS time format in something usable for a
>> diagramm I've used the following formula:
>> C1 = (HOUR(A1)*3600)+(MINUTE(A1)*60)+SECONDS(A1)
>>
>> This will give you in cell C1 the timecode in seconds
>>
>> 5.) Now you can play around with different diagrams, ...
>> For example you can also sort all rows by the weight to create a graph
>> like them attached to this message.
>> This will show you if you have done a good job configuring the tests so
>> that in the critical zone between 80 and 120% of your hold weight there
>> are minimal messages. (high slope)
>>
>> I know looks like a lot of work, but it's done in few minutes and will
>> give you a great view what's going on on your junkmail filter.
>>
>> All of this steps can be automizzed, if someone has time and knowledge
>> to create a small reporting tool...
>>
>> Markus
>>
>>
>>
>>
>
>---
>[This E-mail was scanned for viruses by Declude Virus
>(http://www.declude.com)]
>
>---
>This E-mail came from the Declude.JunkMail mailing list.  To
>unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
>type "unsubscribe Declude.JunkMail".  The archives can be found
>at http://www.mail-archive.com.
>

---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type "unsubscribe Declude.JunkMail".  The archives can be found
at http://www.mail-archive.com.

Reply via email to