On Oct 20, Rose, Jeff said:
Basically what I am trying to do here is parse through an email file
grab the the basics, from/to/subject put those in a small text tab
separated database in the format of
File NumRecipients From FromIP Subject Spam-Status
and then pass the contents along to spamassassin pm to check the status
but the email file contains these lines which mess with spamassassins
filtering which I have to remove in order to get an accurate spam
score(using the pm not the daemon don't ask me why :-))
P I 19-10-2005 21:35:00 0000 ____ ____ < [EMAIL PROTECTED] >
O T
A domain.com [123.12.123.1]
S SMTP [IP ADDRESS]
R W 19-10-2005 21:35:00 0000 ____ _FY_ [EMAIL PROTECTED]
R W 19-10-2005 21:35:00 0000 ____ _FY_ [EMAIL PROTECTED]
R W 19-10-2005 21:35:00 0000 ____ _FY_ [EMAIL PROTECTED]
R W 19-10-2005 21:35:00 0000 ____ _FY_ [EMAIL PROTECTED]
You're on the right track, but I think you're doing far too much with the
regex when you should really just split each line on whitespace and deal
with it like that.
my @records;
while (<FILE>) {
my @fields = split;
# sender
if ($fields[0] eq 'P') {
push @records, [ { SENDER => $fields[-2] } ]; # $fields[-1] is '>'
}
# recipient
elsif ($fields[0] eq 'R') {
push @{ $records[-1]{RECIPIENT} }, $fields[-1];
}
# SMTP
elsif ($fields[0] eq 'S') {
${ $records[-1] }{SMTP} = $fields[-1];
}
# etc.
}
Now you have an array, @records, whose elements are hash references.
Here's what it's like:
@records = (
# email 1
{
SENDER => '...',
RECIPIENT => [ '...', '...' ],
SMTP => '...',
# whatever other fields you want
},
# email 2
{ ... },
);
--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http://www.perlmonks.org/ % have long ago been overpaid?
http://princeton.pm.org/ % -- Meister Eckhart
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>