I'm still digging, but if you amend your log patch to also NULL out dh_params after it frees it, that would at least prevent it trying to free it over and over:
#if (OPENSSL_VERSION_NUMBER >= 0x0090800fL) - if (dh_params) DH_free(dh_params); + if (dh_params) { + syslog(LOG_CRIT, "dh_params will be freed %p", dh_params); + DH_free(dh_params); + dh_params = NULL; + syslog(LOG_CRIT, "dh_params were freed %p", dh_params); + } #endif On Fri, Jul 13, 2018, at 12:47 PM, ellie timoney wrote: > > so it seems to me that the dh_params were set once on startup but > > freed for each closed connection > > Yikes :o > > On Thu, Jul 12, 2018, at 7:09 PM, Michael Menge wrote: > > Hi, > > > > Дилян had suggested to add some debug outputs to imap/tls.c > > > > -------- > > > > diff --git a/imap/tls.c b/imap/tls.c > > --- a/imap/tls.c > > +++ b/imap/tls.c > > @@ -893,7 +893,9 @@ EXPORTED int tls_init_serverengine(const char > > *ident, > > > > #if (OPENSSL_VERSION_NUMBER >= 0x0090800fL) > > /* Load DH params for DHE-* key exchanges */ > > + syslog(LOG_CRIT, "dh_params will be set, current value=%p", dh_params); > > dh_params = load_dh_param(server_key_file, server_cert_file); > > + syslog(LOG_CRIT, "dh_params were set current_value=%p", dh_params); > > SSL_CTX_set_tmp_dh(s_ctx, dh_params); > > #endif > > > > @@ -1308,7 +1310,11 @@ EXPORTED int tls_shutdown_serverengine(void) > > } > > > > #if (OPENSSL_VERSION_NUMBER >= 0x0090800fL) > > - if (dh_params) DH_free(dh_params); > > + if (dh_params) { > > + syslog(LOG_CRIT, "dh_params will be freed %p", dh_params); > > + DH_free(dh_params); > > + syslog(LOG_CRIT, "dh_params were freed %p", dh_params); > > + } > > #endif > > } > > ----- > > > > I did run mupdate with this debug output > > > > > > Jul 12 10:17:25 mx02 mu/mupdate[6537]: dh_params will be set, current > > value=(nil) > > Jul 12 10:17:25 mx02 mu/mupdate[6537]: inittls: Loading DH parameters > > from file > > Jul 12 10:17:25 mx02 mu/mupdate[6537]: dh_params were set > > current_value=0x7fc7541b9600 > > Jul 12 10:17:25 mx02 mu/mupdate[6537]: starttls: TLSv1.2 with cipher > > ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits new) no authentication > > Jul 12 10:17:25 mx02 mu/mupdate[6537]: login: msmx02.mail.localhost > > [10.23.21.78] cyrus PLAIN+TLS User logged in > > Jul 12 10:18:37 mx02 mu/mupdate[6537]: starttls: TLSv1.2 with cipher > > ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits new) no authentication > > Jul 12 10:18:37 mx02 mu/mupdate[6537]: login: msmx02.mail.localhost > > [10.23.21.78] cyrus PLAIN+TLS User logged in > > Jul 12 10:18:37 mx02 mu/mupdate[6537]: dh_params will be freed > > 0x7fc7541b9600 > > Jul 12 10:18:37 mx02 mu/mupdate[6537]: dh_params were freed 0x7fc7541b9600 > > Jul 12 10:18:37 mx02 mu/mupdate[6537]: starttls: TLSv1.2 with cipher > > ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits new) no authentication > > Jul 12 10:18:37 mx02 mu/mupdate[6537]: login: msmx02.mail.localhost > > [10.23.21.78] cyrus PLAIN+TLS User logged in > > Jul 12 10:18:37 mx02 mu/mupdate[6537]: dh_params will be freed > > 0x7fc7541b9600 > > Jul 12 10:18:38 mx02 mu/master[6534]: process type:SERVICE > > name:mupdate path:/usr/local/libexec/mupdate age:89.131s pid:6537 > > signaled to death by signal 11 (Segmentation fault, core dumped) > > Jul 12 10:18:38 mx02 mu/master[6534]: service mupdate/ipv4 pid 6537 in > > READY state: terminated abnormally > > > > so it seems to me that the dh_params were set once on startup but > > freed for each closed connection > > > > Regards > > > > Michael Menge > > > > > > Quoting Michael Menge <michael.me...@zdv.uni-tuebingen.de>: > > > > > Hi Дилян, > > > > > > > > > Quoting Дилян Палаузов <dilyan.palau...@aegee.org>: > > > > > >> Hello Michael, > > >> > > >> this is likely either a memory mishandling issue (use after free(), > > >> double free(), invalid read()/write()...), which gets evident if cyrus > > >> is run under valgrind --tool=memcheck. I run it with > > >> > > >> valgrind --num-callers=30 --leak-check=full --track-origins=yes --read- > > >> var-info=yes --show-leak-kinds=all --trace-children=yes --track-fds=yes > > >> /usr/local/cyrus/bin/master -D &> log-file-memcheck > > >> > > > > > > thanks for the valgind options. Valgrind did find some "Invalid read > > > of size 4" > > > and "size 8" in DH_free as well as "Invalid write of size 4" in > > > CRYPTO_add_lock > > > and "size 8" in OPENSSL_cleanse. As far as i can tell the memory was > > > always free'd > > > before by CRYPTO_free (mem.c:434). > > > > > > I have attached the full log > > > > > >> > > >> Another reason can be multi-threaded inconsistencies: mutexes locked in > > >> inconsistent order by differnt threads (while this is not a cause for > > >> crash, it leads to deadlock), mutexes locked in one thread and unlocked > > >> in another or alike. This can be detected by valgrind/helgrind > > >> > > >> valgrind --tool=helgrind --num-callers=30 --leak-check=full --track- > > >> origins=yes --read-var-info=yes --trace-children=yes --track-fds=yes > > >> /usr/local/cyrus/bin/master -D &> log-file-helgrind > > >> > > > > > > I had to remove some options here, as my valgrind didn't know them with > > > --tool=helgrind > > > > > > I used /usr/bin/valgrind --tool=helgrind --num-callers=30 > > > --read-var-info=yes --trace-children=yes --track-fds=yes > > > /usr/local/libexec/master -C /etc/imapd_mu.conf -M > > > /etc/cyrus_mu.conf -p /var/run/cyurs_mu.pid -D &> > > > /tmp/cyrus-mupdate-log-file-helgrind > > > > > > Valgrind did find some "Possible data race during read" and "This > > > conflicts with a previous write" > > > I have attached the full log as well > > > > > > > > > > > >> Of course, it is useful to have debugging symbols. Compiling with -O3 > > >> migh make things faster, while compiling with -O0 will make it run > > >> considerably slower. If ld (=ld.bfd) is very, very recent (e.g. 2.31), > > >> then link with -Wl,-z,noseparate-code otherwise valgrind cannot read > > >> the debug info (https://bugs.kde.org/show_bug.cgi?id=395682). > > >> > > >> Then it shall be possible to see in one or the other log files what is > > >> relevant for causing SIGSERV, in particular the first place where bad > > >> operation happens, which might be some time before the crash. > > >> > > > > > > I hope this helps untangle the web of multiple threads, race conditions > > > and use after free, to find the root of this problem. > > > > > > If there is any more that i can do to help let me know. > > > > > >> > > >> On Mon, 2018-07-02 at 17:24 +0200, Michael Menge wrote: > > >>> Hi, > > >>> > > >>> we are in the process of setting up our new production mailserver with > > >>> cyrus-imapd 3.0.7 on RHEL 7.5 Servers. > > >>> > > >>> At the moment we encounter many crashes (SIGSEGV) of the mupdate > > >>> process on > > >>> the mupdate master instance. As soon as we issue a command that updates > > >>> multiple mailboxes in short time we trigger a SIGSEGV. > > >>> > > >>>> sam user/zrstes* cyrus all > > >>> > > >>> Setting ACL on user/zrstes1...OK. > > >>> Setting ACL on user/zrstes1/Mail...OK. > > >>> Setting ACL on user/zrstes1/Mail/drafts...OK. > > >>> Setting ACL on user/zrstes1/Mail/s-spam...OK. > > >>> Setting ACL on user/zrstes1/Mail/sent...OK. > > >>> Setting ACL on user/zrstes1/Mail/trash...OK. > > >>> Setting ACL on user/zrstes1/Mail/v-spam...OK. > > >>> Setting ACL on user/zrstes2...cyrus: lrswipkxtea: no connection to > > >>> server > > >>> > > >>> I suspect that at this time the connection from the backend to the > > >>> mupdate master > > >>> is lost as the mupdate process received the SIGSEGV > > >>> > > >>> (gdb) bt > > >>> #0 0x00007fb49c000098 in ?? () > > >>> #1 0x00007fb4a9a59a85 in DH_free (r=0x7fb49c1b9600) at dh_lib.c:194 > > >>> #2 0x00007fb4aa84dbef in tls_shutdown_serverengine () at > > >>> imap/tls.c:1311 > > >>> #3 0x0000000000404075 in conn_free (C=0x7fb4840009f0) at > > >>> imap/mupdate.c:379 > > >>> #4 0x00000000004062c0 in thread_main (rock=0x0) at imap/mupdate.c:1330 > > >>> #5 0x00007fb4a771add5 in start_thread (arg=0x7fb4a22d9700) at > > >>> pthread_create.c:308 > > >>> #6 0x00007fb4a718fb3d in clone () at > > >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 > > >>> > > >>> > > >>> Further information, as DH_free is in the bt, we have NOT configured > > >>> dh_params in > > >>> our ssl certificate/key or in the imapd.conf > > >>> > > > > > > Kind regards > > > > > > Michael Menge > > > > > > > > > > > > -------------------------------------------------------------------------------- > > > M.Menge Tel.: (49) 7071/29-70316 > > > Universität Tübingen Fax.: (49) 7071/29-5912 > > > Zentrum für Datenverarbeitung mail: > > > michael.me...@zdv.uni-tuebingen.de > > > Wächterstraße 76 > > > 72074 Tübingen > > > > > > > > -------------------------------------------------------------------------------- > > M.Menge Tel.: (49) 7071/29-70316 > > Universität Tübingen Fax.: (49) 7071/29-5912 > > Zentrum für Datenverarbeitung mail: > > michael.me...@zdv.uni-tuebingen.de > > Wächterstraße 76 > > 72074 Tübingen > >