i pushed out a small update to upas/fs that should be a big help for mailboxes with >5000 messages. with my test mailbox of 8868 messages, the old fs requires 41mb to read the mailbox and the new just 12mb. it also shaved the startup time from 18s to 3s.
the gnarly details: upas/fs uses 1 Hash structure per file. the parent message and each mime subpart all have 31 Hash entries, one for the part (e.g. "1") and 30 for the various message fields presented as files (e.g. "subject"). so for this mailbox we have a minimum of 274000 Hash entries, and more likely about half a million. the trick (or maybe hack) is to use a placeholder to represent each of the 30 message fields. since there are no optional message fields and directory generation already assumes it knows what they are, this is no loss of generality. even gnarlier details: with only 1277 buckets, our hash table has a load factor α = ~370. wikipedia http://en.wikipedia.org/wiki/Hash_table#Load_factor claims that chained hash tables remain linear regardless of load factor, but i think it is only talking about lookups. (the text is not clear to me.) it fails to mention that insertion of each element is O(α/2). so building the hash table is O(α²/4). - erik
