> Your script, as written, will dutifully extract the date header, parse
> it, and spool to the wrong directory, because you relied on bad data.
>
> In virtually all cases, it's easier, safer, and more accurate to store
> messages based on the time they arrive, not when they say they were
> sent.
I agree. On lists that I archive, I pre-process a lot of the messages
to do various things (remove certain attachment types, check for known
viruses, etc.) with procmail. I then split the messages up by month
with a procmail recipe as well.
One of the pre-processing checks that one could do would rename the original
Date: header to 'Old-Date:', then create a new one based on the contents
of the envelope From header. Here's a procmail recipe that does just
that:
TIMEZONE = "-0600 (CST)"
# Note: the brackets below contain a space and a tab in each
:0
* ^From[ ]+[^ ]+[ ]+\/[^ ]+.*
{ FROMDATE = $MATCH }
:0
* FROMDATE ?? ^()\/[A-Za-z]+
{ WEEKDAY = $MATCH }
:0
* FROMDATE ?? ^[A-Za-z]+\<+\/[A-Za-z]+
{ MONTH = $MATCH }
:0
* FROMDATE ?? ^[A-Za-z]+\<+[A-Za-z]+\>+\/[0-9]+
{ DAY = $MATCH }
:0
* FROMDATE ?? ^[A-Za-z]+\<+[A-Za-z]+\>+[0-9]+\<+\/[0-9:]+
{ TIME = $MATCH }
:0
* FROMDATE ?? ^[A-Za-z]+\<+[A-Za-z]+\>+[0-9]+\<+[0-9:]+\>+\/[0-9]+$
{
YEAR = $MATCH
NEWDATE = "$WEEKDAY, $DAY $MONTH $YEAR $TIME $TIMEZONE"
:0fhw
| formail -i "Date: $NEWDATE"
}
It's probably easier and more extensible to do the same with perl, though,
since you have to make a system call to formail in the procmail recipes above
anyhow:
:0fhw
| perl -e 'while (<STDIN>) { $newdate = "$1, $3 $2 $5 $4 -0600 (CST)" if
/^From\s+\S+\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\d+)/; if (/^Date:\s+/ &&
defined($newdate)) { print "Date: $newdate\nOld-$_"; } else { print $_; } }'
Now there should be no confusion about dates, nor should it matter what you
split on...
I'm not sure I would recommend using these, but if you're concerned about
people being confused over dates and want to stick with the only known
quantity, this should fit the bill.
Chris