Gordon Messmer writes:
I was thinking about possible designs for keyword implementations that would reduce IO. I wondered if storing keywords in files were not a design that suffers the same problems that mbox did.
Not really. With mbox, any change to the status of any message in the mailbox necessitates the rewrite of the entire mbox file. Some optimizations were possible, but, on average, you'll need to rewrite half the file. Furthermore, if the imapd process was killed, you're left with a corrupted mbox file. That really is what kills mbox. It's not reliable and is subject to corruption.
This is not applicable to Courier's keyword file due to the way the imapkeywords file is rewritten. Furthermore, the keywords metadata only needs to be updated whenever keywords are updated, which happens much less often then updates of message status.
What if, rather than the current implementation, each Maildir had an "imapkeywords" directory. This directory may contain sub-directories whose names represent the keywords that have been set on messages in the Maildir. When a keyword is set on a message, a hard link (or symbolic link?) is created in the appropriate directory; the name of the link should be the same as the file containing the message, minus the flags at the end of its name.
So, a FETCH STATUS on each message now needs to stat() each keyword subdirectory, for the presence of the file.
I would imagine that building a list of files with keywords should go something like:* scan the keywords directories and create a list of keywords in the Maildir* scan each keyword directory and create a hash containing the names of message file links * scan the cur/ directory for message files. For each one, check all of the keyword hashes for a match against the file name minus the flags at the end. If there is a match, record that the message was tagged with that keyword.It should be possible to scan each directory using only readdir(), to reduce the IO associated with calling stat() on an indefinite number of message files.Do you think that such a design is possible, Sam?
The problem with this is that, in practice:1) you almost never need to retrieve the keywords of all messages in a folder, just the keywords set for a specific message
2) The overhead of this is somewhat higher than just reading a small number of files, and parsing them
3) The difference in overhead is magnified by the fact that you'll need to repeat the process with every NOOP command, which the client sends to request changes to the status of any message in the folder.
My gut feeling is that this approach actually results in more I/O. Message filenames tend to be longer then keyword names. Given a message filename F, and keywords K(1)..K(n), in your proposal, the baseline datum that represents those keywords set for the message, excluding all other overhead, is length(F)*n. Each keyword directory stores filename F, so that's how many bytes there are to read. Right now, the baseline datum that represents the same keywords would be, approximately: length(F) + n*2, which is significantly less. Here's why. Here's a keyword file in one of my folders:
$Label1 $Label3 1214751906.M126193P4806V0000000000000901I0000000000220B54_0.commodore.email-scan.com,S=5657:1 1214752505.M593345P5048V0000000000000901I0000000000237DE5_0.commodore.email-scan.com,S=3460:0 1214770505.M648167P19061V0000000000000901I0000000000237DEE_0.commodore.email-scan.com,S=3550:0 1214958905.M700616P29551V0000000000000901I0000000000237E6E_0.commodore.email-scan.com,S=2455:0The keyword file lists the names of all the keywords once, and the keywords are assigned to messages by listing their index number, not name. In the example above, the first message has $Label3 set (keyword 1), and the rest have $Label1 set (keyword 0).
When keywords are in heavy use, this is a very compact mechanism for saving the keyword metadata.
The I/O issue, I believe, is really not due to how the keyword metadata is actually stored, but rather because of the overall logic. I made some tweaks to the internal logic in 4.4 which should result in less keyword-related I/O as a result of keyword updates.
pgpsPpgbIjeAW.pgp
Description: PGP signature
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________ courier-users mailing list [email protected] Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users
