Hello Ellie, this is exactly what I see (countless pselect calls), but I have as second parameter of pselect a much larger array. I just observed that on killing master, it terminates all cyrus processes but two (httpd and notifyd). Then I try to connect to that processes (gdb). This does not work, however, since the processes are moved to zombie status.
Greetings Дилян On Thu, 2019-11-28 at 10:34 +1100, ellie timoney wrote: > Saw something similar just now when I killed a cassandane run off > prematurely. One cyrus master process wound up spinning like this: > > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > > 0x0000555ac7124a97 in child_janitor (now=...) at master/master.c:1221 > 1221 janitor_position = janitor_position % child_table_size; > (gdb) bt > #0 0x0000555ac7124a97 in child_janitor (now=...) at master/master.c:1221 > #1 0x0000555ac712a67a in main (argc=10, argv=0x7ffdc1fe78b8) > at master/master.c:2812 > > Haven't dug further yet, but it looks similar to your report > > On Wed, Nov 27, 2019, at 9:17 AM, ellie timoney wrote: > > Can you strace the master process next time it's spinning at 100%? > > What is it doing at that time? > > > > On Tue, Nov 26, 2019, at 1:29 AM, Дилян Палаузов wrote: > > > Hello, > > > > > > > I run cyrus imap 3.0.x with some private changes. > > > > > > > > Sometimes when stop the master process, the master process utilizes one > > > > CPU core to 100% for 5 minutes. After the fifth > > > > minute, systemd enforces kill -9. When I attach to the maste process, I > > > > see that it some janitor does some work, but I > > > > have not checked the details. Has anybody experienced this? > > > > > > I run cyrus imap. At some moment I recompile and reinstall the > > > binaries, which in theory means that the binaries > > > detect this change and restart theirselves. At some moment I call > > > "systemctl stop cyrus-imap" which I guess sends > > > SIGTERM to the master process. Then the CPU utilization of the master > > > process goes to 100%. In the systemd service > > > file I have TimeoutStopSec=320 . After this time, the master process > > > continues running and systemd sends 9/SIGKILL. It > > > is not necessary that on re-installing the binaries, and then shutting > > > down the CPU goes to 100%: it is possible that > > > the CPU goes to 100%, without reinstalling (and thus triggering > > > self-restarting) of the imapd/httpd binaries. > > > > > > It is often, but not always, that this 100% CPU loop is entered on > > > shutdown. > > > > > > I have a webmail client and to speedup things it uses SquirrelMail's > > > IMAP Proxy (http://www.imapproxy.org/ a Caching > > > IMAP proxy). It is recommended in the installation manual of > > > Horde/IMP. The IMAP caching proxy connects to > > > 127.0.0.2:143 (and is therefore permitted to skip the TLS overload). > > > In master conf I have a line > > > “imaplocal cmd="imapd -C /usr/local/etc/cyrus/imapdlocal.conf" > > > listen="127.0.0.2:imap" prefork=0”. > > > > > > When the CPU goes to 100% on shutdown I connect with gdb to the master > > > process. Below is the full backtrace. Does > > > somebody have an explanation why the master process enters a never > > > ending loop? > > > > > > I do not say that all above information has to be involved in the > > > anwer. Has somebody else experienced this effects? > > > Any suggestions how to investigate this deeper? > > > > > > Greetings > > > Дилян > > > > > > --- > > > warning: Could not load vsyscall page because no executable was > > > specified > > > Reading symbols from /usr/local/libexec/master... > > > Attaching to program: /usr/local/libexec/master, process 9247 > > > Reading symbols from /usr/local/lib/libcyrus_min.so.0... > > > Reading symbols from /lib/libuuid.so.1... > > > Reading symbols from /usr/local/lib/libgssapi_krb5.so.2... > > > Reading symbols from /usr/local/lib/libkrb5.so.3... > > > Reading symbols from /usr/local/lib/libk5crypto.so.3... > > > Reading symbols from /usr/local/lib/libcom_err.so.3... > > > Reading symbols from /usr/local/lib/libkrb5support.so.0... > > > Reading symbols from /usr/local/lib/libpcreposix.so.0... > > > (No debugging symbols found in /usr/local/lib/libpcreposix.so.0) > > > Reading symbols from /usr/local/lib/libpcre.so.1... > > > (No debugging symbols found in /usr/local/lib/libpcre.so.1) > > > Reading symbols from /usr/local/lib/libxml2.so.2... > > > Reading symbols from /usr/local/lib/liblzma.so.5... > > > (No debugging symbols found in /usr/local/lib/liblzma.so.5) > > > Reading symbols from /usr/local/lib/libical.so.3... > > > Reading symbols from /usr/local/lib/libicalss.so.3... > > > Reading symbols from /usr/local/lib/libicalvcal.so.3... > > > Reading symbols from /usr/local/lib/libicui18n.so.63... > > > Reading symbols from /usr/local/lib/libicuuc.so.63... > > > Reading symbols from /usr/local/lib/libicudata.so.63... > > > (No debugging symbols found in /usr/local/lib/libicudata.so.63) > > > Reading symbols from /usr/local/lib/libsqlite3.so.0... > > > (No debugging symbols found in /usr/local/lib/libsqlite3.so.0) > > > Reading symbols from /usr/local/lib/libz.so.1... > > > (No debugging symbols found in /usr/local/lib/libz.so.1) > > > Reading symbols from /lib64/libm.so.6... > > > Reading symbols from /lib64/libdl.so.2... > > > Reading symbols from /lib64/libpthread.so.0... > > > [Thread debugging using libthread_db enabled] > > > Using host libthread_db library "/lib64/libthread_db.so.1". > > > Reading symbols from /lib64/libc.so.6... > > > Reading symbols from /lib64/ld-linux-x86-64.so.2... > > > Reading symbols from /lib64/libresolv.so.2... > > > Reading symbols from /usr/local/lib/libdb-18.1.so... > > > Reading symbols from /usr/local/lib64/libstdc++.so.6... > > > Reading symbols from /usr/local/lib64/libgcc_s.so.1... > > > Reading symbols from /usr/local/lib64/libssl.so.1.1... > > > Reading symbols from /usr/local/lib64/libcrypto.so.1.1... > > > Reading symbols from /lib64/libnss_db.so.2... > > > Reading symbols from /lib64/libnss_files.so.2... > > > Reading symbols from /lib64/libnss_dns.so.2... > > > 0x0000000000405406 in child_janitor (now=...) at master/master.c:1192 > > > 1192 janitor_position = janitor_position % child_table_size; > > > ™(gdb) bt f > > > Id Target Id Frame > > > * 1 Thread 0x7f6a08759780 (LWP 9247) "master" 0x0000000000405406 in > > > child_janitor (now=...) at master/master.c:1192 > > > #0 0x0000000000405406 in child_janitor (now=...) at > > > master/master.c:1192 > > > i = 9299 > > > p = 0x4132e0 <ctable+16224> > > > c = 0x0 > > > #1 0x0000000000409dd7 in main (argc=4, argv=0x7ffea3075108) at > > > master/master.c:2600 > > > i = 14 > > > ready_fds = 3 > > > total_children = 11 > > > tv = { > > > tv_sec = 0, > > > tv_usec = 0 > > > } > > > msg = { > > > message = 1, > > > service_pid = 28219 > > > } > > > maxfd = 41 > > > tvptr = 0x0 > > > interrupted = 0 > > > pidfile = 0x40c4f0 "/var/run/cyrus-master.pid" > > > pidfile_lock = 0x2135ba0 "/usr/local/etc/cyrus/imapdlocal.conf" > > > startup_pipe = {6, 7} > > > pidlock_fd = -1 > > > i = 14 > > > opt = -1 > > > close_std = 1 > > > daemon_mode = 1 > > > error_log = 0x0 > > > alt_config = 0x0 > > > fd = 3 > > > rfds = { > > > fds_bits = {266272, 0 <repeats 15 times>} > > > } > > > r = 1 > > > now = { > > > tv_sec = 1574690925, > > > tv_usec = 958878 > > > } > > > p = 0x0 > > > quit > > > Detaching from program: /usr/local/libexec/master, process 9247 > > > [Inferior 1 (process 9247) detached] > > > > > > > > > > > >