i pushed out a small update to upas/fs that should be a big help
for mailboxes with >5000 messages.  with my test mailbox of
8868 messages, the old fs requires 41mb to read the mailbox
and the new just 12mb.  it also shaved the startup time from
18s to 3s.

the gnarly details:

upas/fs uses 1 Hash structure per file.  the parent message
and each mime subpart all have 31 Hash entries, one for the
part (e.g. "1") and 30 for the various message fields presented
as files (e.g. "subject").  so for this mailbox we have a minimum
of 274000 Hash entries, and more likely about half a million.

the trick (or maybe hack) is to use a placeholder to represent
each of the 30 message fields.  since there are no optional
message fields and directory generation already assumes it knows
what they are, this is no loss of generality.

even gnarlier details:

with only 1277 buckets, our hash table has a load factor α = ~370.
wikipedia http://en.wikipedia.org/wiki/Hash_table#Load_factor
claims that chained hash tables remain linear regardless
of load factor, but i think it is only talking about lookups.
(the text is not clear to me.)  it fails to mention that insertion of
each element is O(α/2).  so building the hash table
is O(α²/4). 

- erik

Reply via email to