Re: Seen databases

Ken Murchison Tue, 04 May 2010 13:00:09 -0700

I've been thinking about this for a while and I keep coming back to thesame answer.

seen_local is legacy and I wouldn't expect to find this in the wildanymore. I don't think we should waste cycles doing anything with it.

I don't recall why seen_bigdb was created by one of my predecesors, butits not used in production at CMU. I don't think its the way to go,even with your Seen state changes.

The reason is that I think the distributed Seen state offered by seen_dbis the best for sites with a large number of shared mailboxes, such asCMU. We currently have over 14,000 shared mailboxes that are calledbulletin boards on campus (used to be a lot more when we also hadnon-binary newsgroups). And we have 10's of thousands of users readingthese mailboxes and maintaining their own Seen state. Using the currentdivide and conquer approach where we keep each user's Seen state in aseparate database seems the most sane to me, rather than having severalhundred or thousand handles open to a single database.

Any change that will effect the performance or stability of CMU'scurrent environment would not be a good thing.



Bron Gondwana wrote:

At the moment Cyrus appears to support 3 seen backends:

* seen_local:
    stores all the seen data for all users in a file in
    the spool directory.  Legacy.

* seen_db:
    as far as I can see, everyone uses this.  It's the only
    one that replication's SETSEEN_ALL command works with
    for sure.

* seen_bigdb:
    one single database for ALL users seen data.

Now - I'm in two minds.  I've already made one HUGE change
to how seen is handled, in that it's a system_flag in the
index record for the owner of the mailbox for user.*
mailboxes now.  Also recentuid is in the index header for
the owner.  This catches 99% of cases, reducing IO, since
compulsory CONDSTORE means we're always updating the
record for seen changes anyway.

So - in most cases there will be no $user.seen file any
more.  I'm wondering if there is actually any benefit in
supporting three different operating modes for seen, or
if we should standardise on one. The choices are either
seen_db (advantage - less can go corrupt if anything
goes wrong) or seen_bigdb (advantage - only one file,
reduces the "stat" call and inode caching cost)

For that matter - if we standardised all $user.sub files
into a subscription.db, we'd cut yet another bunch of
tiny files.  I'll probably leave that one alone for now,
since otherwise these changes will get totally out of
hand...

Speaking of which, I'm probably due to write another
update on how my future branch work is going!

Anyway - the reason I'm writing this is: I can see
that I'm going to need to provide a "seen_user_foreach"
API which calls a function with each given seen record
name... and I'm wondering if I should write 3 or just
not bother and standardise on one.

Bron.


--
Kenneth Murchison
Systems Programmer
Carnegie Mellon University

Re: Seen databases

Reply via email to