I'm doing something very similar, but with maildirs and it automatically
runs every month so there's no need to parse out dates. With the maildir
format, every message is in its own file within a directory on the
server corresponding to its imap directory. So, every month, I just have
a shell script run by cron that moves all the the files from the Inbox
into a directory named MonthYear. Since it's just a local filesystem
move, it takes right around zero actual time. Each month is about 100000
messages, so it's real efficient that way. Now if only I could find a
faster way of retrieving the messages. IMAP on Maildirs is pretty fast,
but even with that many messages it gets bogged down and takes several
minutes to index all the messages when I open one of the folders. So
far, Evolution seems to handle it pretty gracefully but any other client
usually dies or just takes a very very long time.
Shannon Roddy wrote:
>Hey folks,
>
>A while back I was looking for something to take a bunch of mbox
>format mail files and archive them in /year/month subdirectories.
>Well... I never found anything I really liked, so Sunday I took an
>hour and wrote one. What took the longest was finding all the info
>about the various python methods, etc. For someone who does not do
>any python programming on a regular basis, I swear python makes things
>too easy.... This has become my language of choice when I need a
>homebrew solution. I took a couple of ideas from a script that I
>found on the web, but the script is nearly from scratch.
>
>Any critiques are welcome, but this was just a quick and dirty script.
> The only problem I found was that various MTAs have poor date fomats,
>so some mail ends up in a subdirectory of 02 instead of 2002, etc.
>Maybe I will address that later... My next "todo" is to write a
>script that will go through my archives and delete duplicate
>messages...
>
>The script is run by simply typing "script_name source_mbox_file
>dest_directory". Also, if the message is lacking a date, it goes in
>dest_directory/0/0. I ran this on ~1GB of old imap folders and it
>took < 15 minutes on a Sun Blade 2000.
>
>Later,
>Shannon
>
>#!/usr/bin/env python
>
>import mailbox, rfc822
>import sys, os, string, re, os.path
>
>LF = '\x0a'
>
>def main():
> mailboxname_in = sys.argv[1]
> mailboxdir_out = sys.argv[2]
> process_mailbox (mailboxname_in, mailboxdir_out)
>
>def process_mailbox (mailboxname_in, mailboxdir_out):
> mb = mailbox.UnixMailbox (file(mailboxname_in,'r'))
> msg = mb.next()
> while msg is not None:
> if msg.getdate('Date') is not None:
> msg_date = msg.getdate('Date')
> elif msg.getdate('Sent') is not None:
> msg_date = msg.getdate('Sent')
> else:
> msg_date = (0 , 0)
> msg_dir = mailboxdir_out + '/' + str(msg_date[0]) + '/' +
>str(msg_date[1]) + '/'
> document = msg.fp.read()
> write_message (msg, document, msg_dir)
> msg = mb.next()
>
>
>
>def write_message (msg, document, msg_dir):
> fname = msg_dir + 'mail.mbox'
> if os.path.exists(msg_dir) is not True:
> os.makedirs(msg_dir)
> fout = file(fname, 'a')
> fout.write(msg.unixfrom)
> for i in msg.headers:
> fout.write (i)
> fout.write(LF)
> fout.write(document)
> fout.close()
>
>if __name__ == '__main__':
> main()
>
>_______________________________________________
>General mailing list
>[email protected]
>http://brlug.net/mailman/listinfo/general_brlug.net
>
>