On Thu, Apr 29, 2010 at 08:44:54AM +1000, Rob Mueller wrote: > Whether to go seen_db or seen_bigdb, that's trickier. seen_db is > what almost everyone uses now, but seen_bigdb seems almost sane > since in most cases, the users own seen state will be in the > cyrus.index.
That's what I figured... > There's one issue with seen_bigdb though, you really would have to > use a real DB (eg bdb or skiplist), not the text file db. Yes, definitely. We use skiplist for seen_db at the moment anyway. Also seen_db is what most people use, so it's pretty well tested. > The other issue I can see, is that seen db is indexed by folder > unqid. How "unique" are folder id's. They're generated in a pretty > adhoc fashion, and it's always scared me that it might be too easy > to generate clashes (when restoring from backups especially), which > would be especially bad for a seen_bigdb. It doesn't really matter for a seen_bigdb, because they'll be keyed by user AND uniqueid - meaning they are no more likely to generate clashes than they were before under seen_db. Besides, they only matter within the non-user folders now. More interesting is the potential for clashes during replication, which would generate a rename event across users. That could get super-ugly! But it's not a high risk - the adhoc uniqueid is a hash of the folder name concatenated with the uidvalidity, so you'd have to have a hash collision and creation at the same second. Restore from backup after a rename is the disaster case. The best way to protect against that is to move the cyrus.header data into a central DB and scan it for matches before creating an entry. Either key an "index" db against the uniqueid directly, or just do a full table scan. The IMAP "LIST" command already does a full table scan, so it can't be TOO expensive :) Bron.