I wanted to experiment with having one mail message per file, but I'm
sad to report that, on my kernel, ext3fs becomes unusably slow with
random access to hundreds of thousands of files, even with
`dir_index`, even if the branching factor is only 100 files per
directory.  (I'm using Linux 2.6.18-6-686 from Debian Etch with
dm-crypt filesystem encryption on a year-1999 700MHz PIII laptop.
Maybe it wouldn't be a problem with a real computer, or a more recent
kernel.)

An earlier version of this code was responsible for the directory with
800 000 files that I mentioned in a previous kragen-hacks post which
included a shell script to remove that directory.  (I had given up on
`rm -rf` after 8 hours.)  This version doesn't stress the filesystem
quite as badly.

What I had in mind was to try checking my mailbox into Git, to get
better compression, faster syncing, and automatic detection and
correction of data corruption.  The 1.4 version of Git I have doesn't
seem to perform reasonably on the task, but maybe that's because the
underlying filesystem is sucking rocks.

I probably ought to try running this on murdererfs and see if it
performs better; after all, this is the kind of thing it's made for,
right?

Like everything else posted to kragen-hacks without any notice to the
contrary, this program is in the public domain; I abandon any
copyright in it.

#!/usr/bin/python
import sys, os

def makedirs(name):
    try: os.makedirs(name)
    except OSError, e:
        if e.errno == 17: pass
        else: raise

class Output:
    def __init__(self, dirname):
        self.dirname = dirname
        self.counter = 0
        self.go_to_new_file()
    def go_to_new_file(self):
        self.counter += 1
        dirname = '%s/%02d/%04d' % (self.dirname,
                                    self.counter % 100,
                                    self.counter % 10000)
        filename = '%s/message-%s' % (dirname, self.counter)
        makedirs(dirname)
        self.fo = file(filename, 'w')
    def write(self, data): self.fo.write(data)
    def close(self): self.fo.close()

def split(mbox, output):
    for line in mbox:
        if line.startswith('From '):
            output.go_to_new_file()
        output.write(line)
    output.close()

if __name__ == '__main__':
    split(sys.stdin, Output(sys.argv[1]))

Reply via email to