Hey folks,
A while back I was looking for something to take a bunch of mbox
format mail files and archive them in /year/month subdirectories.
Well... I never found anything I really liked, so Sunday I took an
hour and wrote one. What took the longest was finding all the info
about the various python methods, etc. For someone who does not do
any python programming on a regular basis, I swear python makes things
too easy.... This has become my language of choice when I need a
homebrew solution. I took a couple of ideas from a script that I
found on the web, but the script is nearly from scratch.
Any critiques are welcome, but this was just a quick and dirty script.
The only problem I found was that various MTAs have poor date fomats,
so some mail ends up in a subdirectory of 02 instead of 2002, etc.
Maybe I will address that later... My next "todo" is to write a
script that will go through my archives and delete duplicate
messages...
The script is run by simply typing "script_name source_mbox_file
dest_directory". Also, if the message is lacking a date, it goes in
dest_directory/0/0. I ran this on ~1GB of old imap folders and it
took < 15 minutes on a Sun Blade 2000.
Later,
Shannon
#!/usr/bin/env python
import mailbox, rfc822
import sys, os, string, re, os.path
LF = '\x0a'
def main():
mailboxname_in = sys.argv[1]
mailboxdir_out = sys.argv[2]
process_mailbox (mailboxname_in, mailboxdir_out)
def process_mailbox (mailboxname_in, mailboxdir_out):
mb = mailbox.UnixMailbox (file(mailboxname_in,'r'))
msg = mb.next()
while msg is not None:
if msg.getdate('Date') is not None:
msg_date = msg.getdate('Date')
elif msg.getdate('Sent') is not None:
msg_date = msg.getdate('Sent')
else:
msg_date = (0 , 0)
msg_dir = mailboxdir_out + '/' + str(msg_date[0]) + '/' +
str(msg_date[1]) + '/'
document = msg.fp.read()
write_message (msg, document, msg_dir)
msg = mb.next()
def write_message (msg, document, msg_dir):
fname = msg_dir + 'mail.mbox'
if os.path.exists(msg_dir) is not True:
os.makedirs(msg_dir)
fout = file(fname, 'a')
fout.write(msg.unixfrom)
for i in msg.headers:
fout.write (i)
fout.write(LF)
fout.write(document)
fout.close()
if __name__ == '__main__':
main()