(Oswal, thanks for the response. One part of figuring out an unexpected
event is whether it is something unique to your site, or whether it
should have been expected.))
We're still trying to understand what happened and why. We have a
possible theory but want to understand if it is right or not.
We had a planned power outage on Saturday. I decided to take the
opportunity of having everyone off the system to upgrade the imapd
server s/w and, as much as possible, to convert mailbox formats. As it
turned out, we had other issues that kept me from converting much.
2007e was running as of Saturday night and it looked ok. I guess people
were enjoying Mother's Day rather than working on Sunday. On Monday
morning, we saw our load skyrocket to close to 100 and swap space was
high. People were complaining about mail problems, and in particular,
about failures to save copies of outgoing mail (which is an IMAP
action). Somewhere aroun 11:30AM, we decided to back down to 2006c
(which we'd been running previously).
I didn't expect the load to stay high on 2006c, but it did. It wasn't as
high but was much higher (40s) than we normally experience (typically
under 5). Finally at 4:45PM, the load dropped significantly. (Other than
our admins, we are not a 9-5 type outfit).
Around 1:30PM, I watched my Sent folder after I sent a single message. I
got Thunderbirid's "There was an error copying to your Sent folder.
Retry?" message. I didn't respond to the pop-up and just watched. My
sent folder had a lock. A few minutes later, the lock disappeared.
Clearly the system was taking a long time to update the Sent folder.
That was odd. For a mbox folder, I'd have thought that all it had to do
was get the lock, determine the type of folder, and append a message
(for mbox folder). That would seem to be fast and independent of size of
the folder just as when sendmail adds a message to /var/spool/mail/USER.
(I understand the issue of rewriting an mbox folder when you read a
message, but no message was changing, just a new one being appended.) In
any case, this should have been the same operation it had been doing for
3+ years.
A possible explanation is that when the version changed, some indexing
had to be (re)done by either the server or client. When our users
started using 2007e, the server would have had to do a lot of work to
support that indexing. Then when we panicked and went back to 2006c, all
the folders indexed under 2007e would have to be reindexed. (Actually,
our 2006c wasn't the exact same imapd we had been using -- it was a
newly compiled version.)
I was watching this morning, and I'm seeing specific jobs get big and
use a lot of CPU. One job is 15 CPU minutes and 200MB virtual (and
resident) size. Another is 315MB. etc. This would be consistent with
what I could imagine for a job indexing a big mail folder.
If this explanation is correct, then the blow up we saw was an effect of
changing versions as compounded by the (inherent) inefficiency of mbox
and IMAP-UW. If correct, I would recommend posting advise such as
[NOTE: This advice is not necessarily correct!]
"When changing versions of IMAP, each user's client(s) will reindex each
mail folder as they are used. Consequently, if all the users hit a new
mail system at the same time, the server will experience a very heavy
load and high virtual memory usage. Users may experience seriously
impaired performance. The system administrators have no way to alleviate
this for any individual user, but if the set of users being transitioned
can be phased in, the overall load on the system can be spread out over
time."
If/when we dare to go back to 2007e, we'll do it late in the day on a
Friay but while some users are still online. We'll try to get people to
come in over the weekend.
I modified everyone's Sent folders last night to be MIX format
(generally from MBOX). Yes, that will cause reindexing by the clients,
but at least they shouldn't get the same error that we were hearing
about yesterday.
The load average does seem to be in the normal range for this time of day.
On 5/10/10 2:15 PM, Oswald Buddenhagen wrote:
On Mon, May 10, 2010 at 12:27:06PM -0700, Mabry Tyson wrote:
BUT.... Today when the system got under load, our load average shot
up to about 100.
i observed the same.
was done that would impair performance. The only significant
changes mentioned are
2006i (COPYUID/APPENDUID)
i was using unix/mbox, so that seemed like a rather plausible
explanation for the observed behaviour. i tried to patch out the support
(my use case doesn't need it anyway), but either i did too little or
that's not the reason after all.
(I'm trying to move to MIX format,
i just did that and never looked back. :)
i still have the impression that it (*) is eating more cpu than before
the upgrade, but i have no numbers other than some random samples of
"ps ux", so this is as solid as jello.
(*) my c-client based filtering mda
_______________________________________________
Imap-uw mailing list
[email protected]
http://mailman2.u.washington.edu/mailman/listinfo/imap-uw
_______________________________________________
Imap-uw mailing list
[email protected]
http://mailman2.u.washington.edu/mailman/listinfo/imap-uw