I've been thinking about this for a while and I keep coming back to the
same answer.
seen_local is legacy and I wouldn't expect to find this in the wild
anymore. I don't think we should waste cycles doing anything with it.
I don't recall why seen_bigdb was created by one of my predecesors, but
its not used in production at CMU. I don't think its the way to go,
even with your Seen state changes.
The reason is that I think the distributed Seen state offered by seen_db
is the best for sites with a large number of shared mailboxes, such as
CMU. We currently have over 14,000 shared mailboxes that are called
bulletin boards on campus (used to be a lot more when we also had
non-binary newsgroups). And we have 10's of thousands of users reading
these mailboxes and maintaining their own Seen state. Using the current
divide and conquer approach where we keep each user's Seen state in a
separate database seems the most sane to me, rather than having several
hundred or thousand handles open to a single database.
Any change that will effect the performance or stability of CMU's
current environment would not be a good thing.
Bron Gondwana wrote:
At the moment Cyrus appears to support 3 seen backends:
* seen_local:
stores all the seen data for all users in a file in
the spool directory. Legacy.
* seen_db:
as far as I can see, everyone uses this. It's the only
one that replication's SETSEEN_ALL command works with
for sure.
* seen_bigdb:
one single database for ALL users seen data.
Now - I'm in two minds. I've already made one HUGE change
to how seen is handled, in that it's a system_flag in the
index record for the owner of the mailbox for user.*
mailboxes now. Also recentuid is in the index header for
the owner. This catches 99% of cases, reducing IO, since
compulsory CONDSTORE means we're always updating the
record for seen changes anyway.
So - in most cases there will be no $user.seen file any
more. I'm wondering if there is actually any benefit in
supporting three different operating modes for seen, or
if we should standardise on one. The choices are either
seen_db (advantage - less can go corrupt if anything
goes wrong) or seen_bigdb (advantage - only one file,
reduces the "stat" call and inode caching cost)
For that matter - if we standardised all $user.sub files
into a subscription.db, we'd cut yet another bunch of
tiny files. I'll probably leave that one alone for now,
since otherwise these changes will get totally out of
hand...
Speaking of which, I'm probably due to write another
update on how my future branch work is going!
Anyway - the reason I'm writing this is: I can see
that I'm going to need to provide a "seen_user_foreach"
API which calls a function with each given seen record
name... and I'm wondering if I should write 3 or just
not bother and standardise on one.
Bron.
--
Kenneth Murchison
Systems Programmer
Carnegie Mellon University