Timo Sirainen <[email protected]> wrote: > On 24.1.2011, at 23.17, Sven Hartge wrote:
>> I take this thread and jump in, since we (TH Mittelhessen, Germany) are >> also investigating the move to Dovecot and we also have the same >> situation as Javier: Courier with Maildir and Bacula as backup >> solution, we even have about the same amount of mails in our system. >> >> And I was also wondering which storage format to use: stay at Maildir >> (no need to worry about indexes, just restore straight to the users >> $HOME/Maildir and be done with it), use sdbox or use mdbox. > Probably a good idea to switch to Dovecot+Maildir first, and then when > everything seems to be working fine switch to mdbox or sdbox. Of course. Being able to convert just a few mailboxes (probable the ones from the admins, eating our own dog food, etc.) over to a different storage method really helps here. >> "Expunging a message only decreases the message's refcount. The space >> is later freed in "purge" step. This is typically done in a nightly >> cronjob when there's less disk I/O activity. The purging first finds >> all files that have refcount=0 mails. Then it goes through each file >> and copies the refcount>0 mails to other mdbox files (to the same >> files as where newly saved messages would also go), updates the map >> index and finally deletes the original file." >> >> For example, we got m.1, m.2 and m.3 and all files have deleted mails >> in it. During expunge, all undeleted mails would go to m.4 and m.5 >> for example. > Typically only new messages are deleted, so typically it would be only > m.3 file that had deleted mails. Probably, yes. But I am trying to prevent a sudden and unpredictable surge in the needed backup space for a day. I guess, I will have to experiment with this. >> Now Bacula backups the mailstorage and has 2 new files to backup and >> 3 old ones to "delete/forget" (using the accurate backup option). >> >> Wouldn't this massivly increase the size of the backup because I end >> up backing many mails multiple times? > Yes, but if you use mdbox_rotate_interval=1d and run the purging > before backups, I think there's a good chance that most of the backed > up mails will be new files that bacula hasn't seen before. Do you mean "new mails" instead of "new files"? Again, I think I will have to experiment with this. Using a new mdbox based on timing and not on the amount or size of mails is an option I have not yet thought of. >> I thought of limiting the amount of mails inside the mdbox to one, thus >> of course defeating the benefit of having multiple mails inside one >> file, but gaining a stable file name over the whole lifetime of a mail >> which will never change, even if the file is moved to a different folder >> or its state changes. > Then you'd want to use sdbox, but that won't decrease the backup time > compared to maildir, since there's the same number of files. Correct. This is why I am very interested in using a bundled format such as mdbox. Right now, I am not able to do real full backups, as this would take about 30 hours. I am limited to VirtualFull backups using the acurate option from Bacula which cuts the daily incremental backup time to about 2 hours. >> Problem: I my end up with hundred thousands of m.* files inside a users >> storage area (Don't ask, we really have this kind of user. And no, there >> are uneducable about this.), even if the user neatly sorted them into >> different IMAP folders. > I don't really understand what you're trying to say with this. m.* > files anyway aren't folder-specific, all of the user's mails are in > the same m.* files. And users can't really affect how m.* files are > created, other than deleting messages all around the mailbox. Yes, exactly. Image a user with 100 folders with 1000 mails per folder: With one mail per mdbox, I'd have 10.000 m.*-Files in the storage area, if I kind of abuse mdbox by just allowing one mail per file. Not optimal. But this is just a case of having one's cake and eating it too. (Hopefully got that proverb right.) Just thinking: can the storage directory for mdbox be hashed? So you for example get <mail location root>/storage/X/Y/m.* instead of <mail location root>/storage/m.* This way any performance degration caused by too many files per directory could be prevented. Grüße, Sven. -- Sig lost. Core dumped.
