Wow, I can't believe you guys, this stuff is amazing. Now to figure out what grep is so I can use it!
Would something written in php be as strong/fast? Dan On Saturday, June 28, 2003 20:09, Bill Landry <[EMAIL PROTECTED]> wrote: >Okay, here is a small contribution to the list. Markus, this >script: > >grep "Total weight =" m:\imail\spool\spam\log\dec0628.log | gawk "{print $2, >$NF}" > log0628.txt > >will output a file called log0628.txt in the following space delimited >format (snip): > >16:35:17 64 >16:35:29 78 >16:35:39 0 >16:36:10 1 >16:36:35 69 >16:36:39 -13 >16:36:50 90 >16:36:51 37 >16:36:55 74 > >As Markus noted, the UNIX utilities needed for to run these scripts can be >found at: http://unxutils.sourceforge.net/ There is no installation, just >simply extract the files contained in the zip file into a directory and >you're all set. > >Here are a couple of additional scripts to get you thinking about the power >of these utilities, which hopefully people will share with the list as they >develop their own scripts. The following script will list all of your >Declude tests and show how many messages were flagged by the >test: > >egrep "Message OK|Msg failed" m:\imail\spool\spam\log\dec0615.log | gawk >"{print $6}" | sort | uniq -c | sort -rn > >This will output a report like the following, in less than 30 seconds (if >any of you have run some of the other JunkMail log reporting tools, you will >find this quite extraordinary in comparison to the hours it takes to run >reports with these other reporting tools): > > 9870 SPAMCHECK > 8827 NOLEGITCONTENT > 8082 IPNOTINMX > 7728 SM-SPAM-L1 > 7466 SM-SPAM-L2 > 7154 SPAMSNIFFER > 6793 WEIGHT36-> > 6541 SM-SPAM-L3 > 5749 REYNOLDS > 5698 HEADERS-FILTER > 5058 EASYNET-DNSBL > 4867 SM-SPAM-L4 > 3932 SUBJECT-FILTER > 3762 BODY-FILTER > 3610 OSSRC > 2973 SPAMHAUS > 2902 OK > 2827 SPAMCOP > 2759 NJABL > 2605 OSSOFT > 2497 SM-SPAM-L5 > 2480 INTERSIL > 1807 NOMOREFUNN > 1486 VOX > 1420 BLARSBL > 1300 FIVETEN-SRC > 1290 MAILFROM-FILTER > 1203 NOABUSE > 1188 NOPOSTMASTER > 1077 HELO-FILTER > 1070 REVDNS > 1010 DSBL > 952 SORBS > 919 EASYNET-PROXIES > 783 DSN > 726 MONKEYPROXIES > 689 BADHEADERS > 680 HEURISTICS > 680 HELOBOGUS > 651 WEIGHT16-35 > 642 REVDNS-FILTER > 422 SPAMBAG > 416 BLITZEDALL > 397 SPAMDOMAINS > 391 LONGSUBJECT > 356 ROUTING > 306 OSPROXY > 306 FIVETEN-OPTIN > 300 COMMENTS > 294 IPWHOIS > 267 SUBJECTSPACES > 247 UCEB > 228 SM-ADULT-L1 > 221 SM-ADULT-L2 > 217 SM-ADULT-L3 > 210 BASE64 > 182 SM-ADULT-L4 > 178 LEADMON > 149 SM-ADULT-L5 > 140 MAILFROM > 114 BH-CHINA > 97 FABEL > 71 KOREA-NETS > 71 KITHRUP > 71 BH-KOREA > 68 BONDEDSENDER > 62 EASYNET-DYNA > 55 DSBL-MULTI > 54 SPAMHEADERS > 53 PIGS > 52 OSRELAY > 51 ORDB > 44 BH-JAPAN > 34 OSDIPS > 32 BH-ARGENTINA > 29 BH-RUSSIA > 27 BH-BRAZIL > 18 BH-TAIWAN > 18 BH-HONGKONG > 16 KUNDENSERVER > 14 BH-THAILAND > 10 DNSRBL-DUN > 8 EXSILIA-SPAM > 7 FIVETEN-MULTI > 4 NONENGLISH > 3 REMOTEIP-FILTER > 3 BH-MALAYSIA > 1 OSLIST > 1 BH-SINGAPORE > >The following script will allow you to view the subject line of all messages >flagged by whatever test you define in the script (in this case I used >"SORBS"), and will sort them by count: > >egrep "Msg failed SORBS|Subject:" m:\imail\spool\spam\log\dec0617.log | >grep -A 1 SORBS | grep Subject | cut -b 39- | sort -f | uniq >-ic | sort -rfn > >The output looks like (snip): > > 10 Subject: You want a bigger one? > 9 Subject: Is your manhood too small? > 9 Subject: CheapTrips Airfares: Best Price Guaranteed > 8 Subject: prevent stretch marks during pregnancy > 8 Subject: Baby Boomers to GenX dhj k > 8 Subject: ##Low Income Funding Program vyig > 8 Subject: ##Low Income Funding Program h ymuviwtx uggldu > 7 Subject: View Photos Of Sexy Singles In Your Area > 7 Subject: SUCCESS... dizaa > 7 Subject: rsvp-feel better guaranteed > 7 Subject: Earn $500 a Week Easily ! > 6 Subject: Increase your Penis by 2 to 5 full inches in Weeks. > 6 Subject: Earn $2000 Weekly Easily! > 5 Subject: good news - accelerates recovery from athletic injury > 5 Subject: Bargain Shoes > 5 Subject: >#Government Loan Program### ryb o q > >These scripts have to run all on one line, with no carriage returns, in >order to work properly. Also, you will need to run these scripts from the >directory that you have extracted the UNIX utilities to. This is because >some of the files have the same name as Windows utilities, like "sort" for >example. > >Speaking of "sort", which is used is a couple of these scripts, there >appears to be about a 2mb size limitation on the content you are trying to >sort. It will only be an issue if you log files are around 25mb or larger, >since the script is trying to sort on the output of the first grep command. >I have sent an e-mail to the developer asking him about this size >limitation, since there appears to be no size limitation on our Linux >machines, where I can run the same script on any size log file. > >Have fun! > >Bill > >----- Original Message ----- >From: "Markus Gufler" <[EMAIL PROTECTED]> >To: <[EMAIL PROTECTED]> >Sent: Saturday, June 28, 2003 1:17 AM >Subject: RE: [Declude.JunkMail] time-dependently hold weight > > >> >> >> > I've considered this a few times, every time I prepare to >> > suggest it I remember what happened with my idea to test for >> > long subjects, there just isn't enough uniformity. >> >> Well. Maybe my idea is expressed from "the wrong side". >> Watching the diagram I can also simply fathom that my current hold >> weight is a little bit too low. >> After adding some new SpamChk tests (we are currently testing) and some >> new RBL-lists, the average value has increased a little bit. So the only >> thing I have to do is to increase slightly the hold weight (or decrease >> the points for every single test) >> >> Remains the fact, that only 13% of our FP's whas recieved out of >> business time. If there is some way to detect the senders local current >> time or timezone this for sure will help again to reduce false positives >> or false negatives using a "time-dependently hold weight" >> >> >> > BTW, the graph is amazing, how is it made? >> >> Hmmm, it's not an "out of the box" tool, but maybe someone can develop >> it. I think it should be very easy but at the moment I'm not familiar >> with any RAD tool... >> >> So here the steps what I've done: >> >> 1.) grep all lines from the declude logfile containing "Total weight =" >> Grep.exe is part of the unixtools what you can find on >> http://unxutils.sourceforge.net/ >> Don't fear to "install" this tools. You can also simply extract the >> zip-archive. >> >> C:\imail\spool\grep -U "Total weight =" dec0624.log > >> c:\imail\spool\tw0624.log >> >> This will create a new file tw0624.log in the spool folder containing >> only the lines with the total weight of any message processed by declude >> junkmail. >> >> Note: You need at least loglevel MID to see the "Total weight" lines in >> the logfile. >> >> 2.) Now I've "elaborated" my tw-file >> In the following original line >> 06/21/2003 00:01:42 Q843b181400780c01 HELOBOGUS:19 . Total weight = 19 >> >> a.) delete the date "06/21/2003 " >> 00:01:42 Q843b181400780c01 HELOBOGUS:19 . Total weight = 19 >> >> b.) replace the " Q" after the time with ";" >> 00:01:42;843b181400780c01 HELOBOGUS:19 . Total weight = 19 >> >> c.) replace the "Total weight = " with ";" >> 00:01:42;843b181400780c01 HELOBOGUS:19 . ;19 >> >> 3.) Now you have a CSV file with the time in the first and the weight in >> the third column. >> You can import this for example into MS Excel >> >> 4.) To "decode" the HH:MM:SS time format in something usable for a >> diagramm I've used the following formula: >> C1 = (HOUR(A1)*3600)+(MINUTE(A1)*60)+SECONDS(A1) >> >> This will give you in cell C1 the timecode in seconds >> >> 5.) Now you can play around with different diagrams, ... >> For example you can also sort all rows by the weight to create a graph >> like them attached to this message. >> This will show you if you have done a good job configuring the tests so >> that in the critical zone between 80 and 120% of your hold weight there >> are minimal messages. (high slope) >> >> I know looks like a lot of work, but it's done in few minutes and will >> give you a great view what's going on on your junkmail filter. >> >> All of this steps can be automizzed, if someone has time and knowledge >> to create a small reporting tool... >> >> Markus >> >> >> >> > >--- >[This E-mail was scanned for viruses by Declude Virus >(http://www.declude.com)] > >--- >This E-mail came from the Declude.JunkMail mailing list. To >unsubscribe, just send an E-mail to [EMAIL PROTECTED], and >type "unsubscribe Declude.JunkMail". The archives can be found >at http://www.mail-archive.com. > --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type "unsubscribe Declude.JunkMail". The archives can be found at http://www.mail-archive.com.