I agree on the fdset part. I guess the best (because of simple and easy to understand) solution is what you already proposed: simply go on `select`ing.
As far as I can see, the only consequences of your original proposal, putting a small loop around the select statement: - SIGCHLD will be handled slightly later -- I think that's fine - SIGHUP will be handled later -- so the administrator has to wait for ten seconds in the worst case when reloading services. I guess we can also live with that pretty well. - Shutdown signals might be late for ten seconds worst case. Again, I think that's fine. Or did I miss anything of relevance here? Regards, Jens Am 06.02.2016 um 13:00 schrieb Bron Gondwana: > I could buy repeating the whole thing for EINTR but not for EAGAIN. > Obviously everything is fine under non-heavy load no matter what. > > I'm pretty sure your patch is wrong, because it will process the fdset that > the select was supposed to fill even if the select returned an error. > > You may have convinced me that my patch is wrong as well, and that we have to > always check for the in_shutdown and sigquit magic in that tight loop in case > it was a signal that needs to be responded to. > > Bron. > > On Sat, Feb 6, 2016, at 20:52, Jens Erat wrote: >> Hi Bron, >> >> thank you for the fast reply. >> >>> One thing I would suggest is running something like nginx in front of >> Cyrus, even if you only need it for authentication checks. >> >> We'll discuss this internally. >> >>> I believe this is the correct patch. If the select was interrupted >> then it needs to be re-run, but there is no need to repeat all the other >> work, so: >>> >>> - r = myselect(maxfd, &rfds, NULL, NULL, tvptr); >>> - if (r == -1 && errno == EAGAIN) continue; >>> - if (r == -1 && errno == EINTR) continue; >>> + do { >>> + r = myselect(maxfd, &rfds, NULL, NULL, tvptr); >>> + } while (r == -1 && (errno == EAGAIN || errno == EINTR)); >> >> From my very basic understanding on EAGAIN, this might be the right >> thing to do. I'm not sure about EINTR, though. >> >> Are you sure we want to keep on waiting when being interrupted? I'm >> definitely not experienced as a systems and C programmer, but as far as >> I know EINTR will only/usually be thrown if a signal was received. Is >> receiving a signal not always an indication the master has something to >> do, and should go on doing whatever this might be instead of waiting for >> more messages? >> >> Regards, >> Jens >> >> PS: I've added some people involved in the CC interested in the >> conversation but not on the list. >> >> Am 06.02.2016 um 04:19 schrieb Bron Gondwana via Cyrus-devel: >>> Thanks for the report. I see your point - there's definitely a DOS >>> possibility here. >>> >>> One thing I would suggest is running something like nginx in front of >>> Cyrus, even if you only need it for authentication checks. >>> >>> I believe this is the correct patch. If the select was interrupted then it >>> needs to be re-run, but there is no need to repeat all the other work, so: >>> >>> - r = myselect(maxfd, &rfds, NULL, NULL, tvptr); >>> - if (r == -1 && errno == EAGAIN) continue; >>> - if (r == -1 && errno == EINTR) continue; >>> + do { >>> + r = myselect(maxfd, &rfds, NULL, NULL, tvptr); >>> + } while (r == -1 && (errno == EAGAIN || errno == EINTR)); >>> >>> I'm quite confident this is correct, so I will roll it out to FastMail's >>> testing servers today. If they are happy, I'll push it upstream with >>> backports for 2.5 and 2.4. >>> >>> Cheers, >>> >>> Bron. >>> >>> On Sat, Feb 6, 2016, at 03:58, Jens Erat via Cyrus-devel wrote: >>>> Dear Cyrus maintainers, >>>> >>>> the very short version: we probably found an issue resulting in the >>>> master stalling high-volume IMAP servers. A possible patch is attached, >>>> but needs some more discussion. >>>> >>>> - - - >>>> >>>> At the University of Konstanz, we're running a rather large IMAP server >>>> for about 18k users on some large Oracle/Solaris machines. We don't have >>>> a murder/cluster. The machine is overprovisioned to last for some years >>>> and is running blazingly fast during normal operation. We're running on >>>> Solaris 11 and ZFS, the mailboxes are stored on a network storage >>>> attached through Fiberchannel (and we don't observe any I/O issues). The >>>> dual socket machine is equipped with 256GB of memory. We tried both >>>> compiling with GnuCC 4.8 as well as Sun CC (from Solaris Studio). >>>> >>>> Then, from time to time, it completely stalled -- usually at times with >>>> high fork rate. We observed this issue to happen in the morning when >>>> lots of people started reading their mails, we experienced it during a >>>> DDOS attack on our network, which made the firewall drop lots of >>>> connections (and the mail clients trying to reconnect instantly). >>>> >>>> As a result, the mail server was still running fine for everybody still >>>> holding a connection, but denied service for pretty much all new >>>> connecting clients. We had to restart the whole Cyrus IMAPd service to >>>> recover from this issue. >>>> >>>> >>>> We started getting a clue of what's going wrong when we actually added >>>> debug information and some ctable verifications, as we expected >>>> something would be wrong with how the active/ready workers are counted. >>>> In the end, we also took time checkpoints during the master loop to see >>>> _where_ it is stuck. In the end, it seemed, this was while jumping back >>>> from the very last command (getting a timestamp in the loop) to the very >>>> first (also getting a timestamp). The code pretty much looked like: >>>> >>>> struct timeval profiling[11]; >>>> gettimeofday(&profiling[10], 0); // LOOP DONE >>>> for (;;) { >>>> gettimeofday(&profiling[0], 0); // INIT >>>> >>>> // Start scheduled tasks >>>> >>>> gettimeofday(&profiling[1], 0); // SCHEDULE >>>> >>>> // Other master loop tasks: spawn, message handling, ... >>>> >>>> gettimeofday(&profiling[9], 0); // SNMP_ALARMS >>>> >>>> syslog(LOG_INFO, "MASTERLOOP_PROFILING: >>>> %f,%f,%f,%f,%f,%f,%f,%f,%f,%f", >>>> timesub(&profiling[0], &profiling[1]), >>>> timesub(&profiling[0], &profiling[1]), >>>> timesub(&profiling[1], &profiling[2]), >>>> timesub(&profiling[2], &profiling[3]), >>>> timesub(&profiling[3], &profiling[4]), >>>> timesub(&profiling[4], &profiling[5]), >>>> timesub(&profiling[5], &profiling[6]), >>>> timesub(&profiling[6], &profiling[7]), >>>> timesub(&profiling[7], &profiling[8]), >>>> timesub(&profiling[8], &profiling[9])); >>>> >>>> gettimeofday(&profiling[10], 0); // LOOP DONE >>>> } >>>> >>>> The results really puzzled us. What might be the reason jumping back >>>> from the end of a loop to the beginning took multiple minutes? In the >>>> end, by adding another log line in the _beginning_ of the loop we >>>> realized that the loop was indeed running very often -- but simply did >>>> never complete. `profiling[10]` did never change when the server was stuck. >>>> >>>> Knowing this the problem was obvious: when `select`ing the sockets and >>>> waiting for messages, the server got interrupted all the time. This >>>> results in the whole loop starting over from scratch, passing over >>>> message handling and thus accounting of available worker daemons, too >>>> few of them getting spawned. >>>> >>>> r = myselect(maxfd, &rfds, NULL, NULL, tvptr); >>>> if (r == -1 && errno == EAGAIN) continue; >>>> if (r == -1 && errno == EINTR) continue; >>>> if (r == -1) { >>>> /* uh oh */ >>>> fatalf(1, "select failed: %m"); >>>> } >>>> >>>> This is a common pattern, if you want to `sleep` or `select`: try to do >>>> it, and if you get interrupted or informed to try again, `continue` and >>>> do again. But usually, this is enclosed in a tiny loop. >>>> >>>> On a test machine, we were not able to completely reproduce the issue, >>>> but have very well been able to observe the message handling part of the >>>> loop not running for a dozen or more master loop runs. A small patch >>>> replacing this with going on with the current loop instead of starting >>>> over from scratch completely resolved the issue (for `master/master.c` >>>> in Cyrus IMAP 2.5.7 release tar): >>>> >>>> 2481,2483c2481 >>>> < if (r == -1 && errno == EAGAIN) continue; >>>> < if (r == -1 && errno == EINTR) continue; >>>> < if (r == -1) { >>>> --- >>>>> if (r == -1 && !(errno == EAGAIN || errno == EINTR)) { >>>> >>>> This ran fine on our test setup, but we're still scared patching one of >>>> the most central flows of logic in a very relevant service. >>>> >>>> What is your opinion: >>>> >>>> - Message handling was sometimes stuck for minutes. I guess we can agree >>>> this should never happen on a high volume server. >>>> - Might this be related to the issues we observed? >>>> - Are there any consequences to the subsequent code (message handling >>>> and reaping children) if we do _not_ start from scratch here? >>>> >>>> If we get a "this looks fine, and you won't horribly mess up" from the >>>> developers, we will have a try and patch the production machine as >>>> proposed. >>>> >>>> Kind regards from Lake Constance, Germany, >>>> Jens Erat >>>> >>>> >>>> -- >>>> Jens Erat >>>> Universität Konstanz >>>> Kommunikations-, Infomations-, Medienzentrum (KIM) >>>> Abteilung Basisdienste >>>> D-78457 Konstanz >>>> Mail: jens.e...@uni-konstanz.de >>>> >>>> Email had 1 attachment: >>>> + smime.p7s >>>> 7k (application/pkcs7-signature) >>> >>> >> >> -- >> Jens Erat >> Universität Konstanz >> Kommunikations-, Infomations-, Medienzentrum (KIM) >> Abteilung Basisdienste >> D-78457 Konstanz >> Mail: jens.e...@uni-konstanz.de >> >> Email had 1 attachment: >> + smime.p7s >> 7k (application/pkcs7-signature) > > -- Jens Erat Universität Konstanz Kommunikations-, Infomations-, Medienzentrum (KIM) Abteilung Basisdienste D-78457 Konstanz Mail: jens.e...@uni-konstanz.de
smime.p7s
Description: S/MIME Cryptographic Signature