* On 2007.02.05, in <[EMAIL PROTECTED]>, * "Thomas Roessler" <[EMAIL PROTECTED]> wrote: > I'm inclined to suggest that we turn on the header cahce for *all* > folder types, not just for maildirs -- or at least to do some > performance testing as to whether there's a way to activate it for > mbox folders that would make parsing these much faster.
I would like to see that. I'm afraid I can't spare time now -- many projects open at once -- but perhaps I could at some point if it's not already adopted. Nonetheless, maybe it's meaningful to talk about strategy. T. Glanzmann has implied that caching mbox is not doable[1] because of cache consistency concerns, presumably because mboxes messages aren't discrete and associated to a unique filestore object that carries change metadata. I think a "good enough" solution can be reached by storing message byte offsets in the cache db with a checksum/hash of the N bytes following that offset. From offset deltas you can deduce the message length in the real mbox file (the cache may already know the length) and the hash over length N gives you a probabilty of N/length that the message has not been externally modified. If N is equal to the header length, then that's equal to Maildir's confidence, but N can vary (at cost of confidence) if it improves performance. N can be different for each message, and cached. Performance of an mbox cached in this way is probably not notably greater (if at all) than uncached where messages are small (under one or two read blocks), but in an mbox folder with large messages, the seek should improve performance somewhat. The test data would be interesting, anyway. [1] Message-id: <[EMAIL PROTECTED]> -- -D. [EMAIL PROTECTED] NSIT University of Chicago
