Marrco wrote:
>> Thanks! I'll be working on some scripting to maintain a good
>> spamtrap list using yours as a starting point. The list I use now
>> was tediously created
>
>
>
> Don't hesitate to ask, should any need arise. But don't forget to
> share you findings and you improved scripts with the list. (or use
> the wiki)
OK, here's what I've got so far...I decided to use native Windows commands
with the exception of one executable program to "sort unique". Here's the
link to the original file I downloaded:
http://golden-triangle.com/UNIQUE.COM
Here's the batch file. I run it as part of my rebuilddb.bat. Edit the file
names and/or directories to suit your own ASSP setup. Note "*maillog" will
use all of my logs:
------------X---------------
@echo off
if exist tmp.txt del tmp.txt
if exist tmpaddr.txt del tmpaddr.txt
echo Collecting invalid email addresses from the ASSP logs...
findstr /C:"invalid address rejected: " *maillog > invalid.txt
::Get interesting data only
echo.
FOR /F "tokens=9 delims= " %%i in (invalid.txt) do @echo %%i >> tmpaddr.txt
FOR /F "tokens=1 delims=@" %%i in (tmpaddr.txt) do @echo %%i >> tmp.txt
echo Data collected and parsed...
echo.
::Sort list
echo Sorting list...
echo.
sort <tmp.txt> sorted.txt
del tmp.txt
del tmpaddr.txt
:: Keep unique lines
echo Removing duplicate email names...
type sorted.txt | unique > penaltytrapaddresses.txt
del sorted.txt
echo.
echo Finished!
exit
------------X---------------
Here's perl script that will "sort unique" as well:
#!/usr/bin/perl -w
use strict;
sub ltrim($);
# Set to filename of CSV file
my $infile = 'names.txt';
# Set to filename of de-duped file (new file)
my $newfile = 'trapaddresses.txt';
### Shouldn't need to change stuff below here ###
open (IN, "<$infile") or die "Couldn't open input file: $!";
open (OUT, ">$newfile") or die "Couldn't open output file: $!";
# Slurp in & sort everything else
my @data = sort <IN>;
my $n = 0;
# Now go through the data line by line, writing it to output unless it's
identical
# to the previous line (in which case it's a dupe)
my $lastline = '';
foreach my $currentline (@data) {
next if $currentline eq $lastline;
print OUT ltrim($currentline);
$lastline = $currentline;
$n++;
}
close IN; close OUT;
print "Processing complete. In = " . scalar @data . " records, Out = $n
records\n";
# Left trim function to remove leading whitespace
sub ltrim($)
{
my $string = shift;
$string =~ s/^\s+//;
return $string;
}
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Assp-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-user