A NOTE has been added to this issue. ====================================================================== http://dbmail.org/mantis/view.php?id=1081 ====================================================================== Reported By: PeterS Assigned To: ====================================================================== Project: DBMail Issue ID: 1081 Category: Authentication layer Reproducibility: always Severity: major Priority: normal Status: new target: ====================================================================== Date Submitted: 10-Aug-16 23:27 CEST Last Modified: 14-Oct-16 20:50 CEST ====================================================================== Summary: DBmail ABEND'ing upon LDAP access error. Description: I've been trying to find a version of DBmail 3.X that does not ABEND while it is accessing LDAP. I have not been able to determine the sequence of events that causes the ABEND, however it happens to me many many times a day. I've written a script around the execution of DBmail to ensure that it restarts when it ABENDs. ======================================================================
---------------------------------------------------------------------- (0003740) PeterS (reporter) - 11-Aug-16 19:57 http://dbmail.org/mantis/view.php?id=1081#c3740 ---------------------------------------------------------------------- Persuant to the "Bug posting etiquette", yesterday I pulled a clean, newest, version of DBmail from GIT, compiled it, installed it, cleared the dbmail.err log, and started it. Since then I have received 14 ABENDs. [compiler@swlx143 dbmail-git]$ date ; git status -u no Thu Aug 11 13:02:11 CDT 2016 # On branch master nothing to commit (working directory clean) [compiler@swlx143 dbmail-git]$ ---------------------------------------------------------------------- (0003745) PeterS (reporter) - 07-Sep-16 23:37 http://dbmail.org/mantis/view.php?id=1081#c3745 ---------------------------------------------------------------------- Today I migrated my database over to new hardware and am receiving the same error. dbmail-imapd: sasl.c:75: ldap_sasl_bind: Assertion `( (ld)->ldc->ldc_options.ldo_valid == 0x2 )' failed. Red Hat RHEL7 dbmail-auth-ldap-3.2.3-1.el7.x86_64 dbmail-3.2.3-1.el7.x86_64 postgresql-9.2.15-1.el7_2.x86_64 postgresql-server-9.2.15-1.el7_2.x86_64 ---------------------------------------------------------------------- (0003746) alan (reporter) - 08-Sep-16 19:01 http://dbmail.org/mantis/view.php?id=1081#c3746 ---------------------------------------------------------------------- This error appears to be ldap connection related. Error reports elsewhere suggest either something in /etc/openldap/ldap.conf or library compatibility following updates to openssl. Are there any certificates missing? Are you able to compile dbmail locally, this may address library issues. ---------------------------------------------------------------------- (0003749) PeterS (reporter) - 15-Sep-16 17:31 http://dbmail.org/mantis/view.php?id=1081#c3749 ---------------------------------------------------------------------- Alan, I have already tried the latest build(s) of DBmail to no avail. I've disabled all certificate handling in OPENLDAP using both: TLS_REQCERT never and also: TLS_REQCERT allow with no change in results. We have multiple Microsoft AD LDAP servers here, with varying certificates that aren't really verifiable, so I have always disabled tracking or requiring valid certificates. We have other Linux systems using LDAP/OPENLDAP without any issues using this same config. ---------------------------------------------------------------------- (0003751) alan (reporter) - 28-Sep-16 20:24 http://dbmail.org/mantis/view.php?id=1081#c3751 ---------------------------------------------------------------------- Confirm I'm able to reproduce the issue, think I've tracked down the parts of the code causing it. Will test asap. ---------------------------------------------------------------------- (0003752) alan (reporter) - 30-Sep-16 16:58 http://dbmail.org/mantis/view.php?id=1081#c3752 ---------------------------------------------------------------------- This issues appears to be fixed using the attached patch, please test. ---------------------------------------------------------------------- (0003753) PeterS (reporter) - 30-Sep-16 23:37 http://dbmail.org/mantis/view.php?id=1081#c3753 ---------------------------------------------------------------------- I've patched the Red Hat RHEL 7 (EPEL 7) version of DBmail (dbmail-3.2.3-1.el7.x86_64, dbmail-auth-ldap-3.2.3-1.el7.x86_64, and dbmail-debuginfo-3.2.3-1.el7.x86_64) with dbmail_src_modules_authldap.c-20160930.diff and receive the attached error, dbmail.20160930-1642.err . The service still ABEND's. ---------------------------------------------------------------------- (0003754) PeterS (reporter) - 01-Oct-16 00:18 http://dbmail.org/mantis/view.php?id=1081#c3754 ---------------------------------------------------------------------- I've patched a GIT clone (git clone git://git.dbmail.eu/paul/dbmail) from today with dbmail_src_modules_authldap.c-20160930.diff and receive the attached error, dbmail.20160930-1721.err . The service still ABEND's. ---------------------------------------------------------------------- (0003755) alan (reporter) - 01-Oct-16 15:36 http://dbmail.org/mantis/view.php?id=1081#c3755 ---------------------------------------------------------------------- Ok, latest patch src_modules_authldap.c-20161001.diff increases the time to check for server gone away to 2 minutes. There is an edge case where after binding, the connection instantly goes away. Also the daemon exits if the ldap server doesn't reappear after two minutes, similar to not starting if not available. There should be warnings such as the following, then recovers when the server returns. Oct 01 14:06:38 lully.p-o.co.uk dbmail-imapd[96422]: [0x805c0a800] Warning:[auth] ldap_con_get(+142): LDAP gone away: Can't contact LDAP server. Trying to reconnect(1/5). Oct 01 14:06:39 lully.p-o.co.uk dbmail-imapd[96422]: [0x805c0a800] Warning:[auth] ldap_con_get(+142): LDAP gone away: Can't contact LDAP server. Trying to reconnect(2/5). Oct 01 14:06:49 lully.p-o.co.uk dbmail-imapd[96422]: [0x805c0a800] Warning:[auth] ldap_con_get(+142): LDAP gone away: Can't contact LDAP server. Trying to reconnect(119/5). Does this get closer to addressing your issue? ---------------------------------------------------------------------- (0003756) PeterS (reporter) - 03-Oct-16 16:54 http://dbmail.org/mantis/view.php?id=1081#c3756 ---------------------------------------------------------------------- I've patched the Red Hat RHEL 7 (EPEL 7) version of DBmail (dbmail-3.2.3-1.el7.x86_64, dbmail-auth-ldap-3.2.3-1.el7.x86_64, and dbmail-debuginfo-3.2.3-1.el7.x86_64) with src_modules_authldap.c-20161001.diff and receive the attached error, dbmail.20161003-0956.err . This time I've cut the last parts of the full dbmail.err file which consists of the last threads 0x7f86e84e8000 and 0x7f86e81d1400 . The service still ABEND's. I'll now try with today's GIT version and the patch. ---------------------------------------------------------------------- (0003757) PeterS (reporter) - 03-Oct-16 19:57 http://dbmail.org/mantis/view.php?id=1081#c3757 ---------------------------------------------------------------------- I've patched a GIT clone (git clone git://git.dbmail.eu/paul/dbmail) from today with src_modules_authldap.c-20161001.diff and receive the attached error, dbmail.20161003-1055.err (dbmail.20161003-1055.err.xz) . This time I've cut the last parts of the full dbmail.err file which consists of the last threads 0x1466450 and 0xde7400. The service still ABEND's. ---------------------------------------------------------------------- (0003758) alan (reporter) - 04-Oct-16 17:54 http://dbmail.org/mantis/view.php?id=1081#c3758 ---------------------------------------------------------------------- Patch src_modules_authldap.c-20161004.diff adds slightly more debugging information, plus binding and search retries. The error appears to be transient and localised in authldap_search, this patch should address the issue. fyi there is a copy at https://github.com/alan-hicks/dbmail ---------------------------------------------------------------------- (0003759) PeterS (reporter) - 04-Oct-16 21:47 http://dbmail.org/mantis/view.php?id=1081#c3759 ---------------------------------------------------------------------- I pulled down a clone of your https://github.com/alan-hicks/dbmail and had it running. This version does not ABEND but rather stops communicating (which caused my client to Segfault in the middle of doing IMAPS things). Please see the attached dbmail.20161004-1348.err (dbmail.20161004-1348.err.xz) for the full error listing. Correction: it did ABEND when I restarted my IMAPS client and attempted to connect to the DBmail instance. The same "dbmail-imapd: dm_config.c:134: config_get_value_once: Assertion `config_dict' failed." error as before is the last thing logged. ---------------------------------------------------------------------- (0003760) alan (reporter) - 06-Oct-16 15:57 http://dbmail.org/mantis/view.php?id=1081#c3760 ---------------------------------------------------------------------- I'm struggling to find what causes dbmail to be unable to contact the server. Patch src_modules_authldap.c-20161006.diff bumps the search timeout to 60 seconds. The error is that the ldap server can't be contacted. Although there is only a single connection, nothing suggests it's blocked, out of memory or any of the normal errors. As there are relatively few ldap server queries compared to sql, pooling probably wouldn't help. I'm assuming the ldap server/active directory is fine and that it's just dbmail that's unable to connect. Not sure what would have caused your client to segfault, the retries would have paused sending anything back to the client and at most would have dropped the connection. Unable to see any "dm_config.c:134: config_get_value_once: Assertion" errors in the log, can you upload the part of the log leading up to that assertion please as I'm not sure where getting config values would be called after startup. ---------------------------------------------------------------------- (0003761) PeterS (reporter) - 06-Oct-16 19:45 http://dbmail.org/mantis/view.php?id=1081#c3761 ---------------------------------------------------------------------- Alan, thank you for continuing to assist with this. At some point I may write a script that will allow me to sanitize my logs so that I can just post the entire thing to you. I am attaching even more information for you. It looks like my dbmail is not disconnecting sessions that it makes with our LDAP server, and apparently finally ABEND's after it attempts to re-use a connection to LDAP that is over an hour old. Please see the attached logs -- dbmail.20161006-1236.err (dbmail.20161006-1236.err.xz), dbmail.20161006-1236.ldaps_tcpdump.txt (dbmail.20161006-1236.ldaps_tcpdump.txt.xz), and dbmail.20161006-1236.extra.txt (dbmail.20161006-1236.extra.txt.xz) . dbmail.20161006-1236.err contains the very beginning of the log and a large portion of the end of the log. The "dm_config.c:134:" error is there. dbmail.20161006-1236.ldaps_tcpdump.txt is the text framing output of the *full* LDAPS conversation with my LDAP server (I captured all traffic to and from its IP address.) dbmail.20161006-1236.extra.txt contains all of the "ldap_con_get"-s that exist in the full dbmail.err log. There is a drop-off from 11:14:53 to 12:15:05 at which point DBmail ABENDs. ---------------------------------------------------------------------- (0003762) alan (reporter) - 12-Oct-16 20:13 http://dbmail.org/mantis/view.php?id=1081#c3762 ---------------------------------------------------------------------- I'm happy to continue supporting dbmail, it's used here so makes sense to share and keep it up to date. It's normal to keep ldap connections open if the threads are still extant, it's similar for the database. As there are no obvious reasons why the search suddenly stops without an error, it may be due to a long running thread hogging a resource, I'm aware that occasionally a long folder listing/update appears to suspend some clients. Patch src_modules_authldap.c-20161012.diff updates some deprecated functions and takes the timeout value from query_timeout in dbmail.conf which from memory defaults to 300 (5 minutes). If this fixes the issue great, otherwise updating the code to be more resource friendly looks as though it will take much longer. As dbmail.20161006-1236.err.xz timed out after 60 seconds it may not achieve much, though you might try updating query_timeout to 3600 (an hour) to test if whatever is causing the block is released. Hopefully whatever precedes any resumption will help pinpoint the blocking. ---------------------------------------------------------------------- (0003763) alan (reporter) - 13-Oct-16 17:16 http://dbmail.org/mantis/view.php?id=1081#c3763 ---------------------------------------------------------------------- patch src_server.c-20161013.diff removes some redundant event and profiling code. ---------------------------------------------------------------------- (0003764) alan (reporter) - 14-Oct-16 19:01 http://dbmail.org/mantis/view.php?id=1081#c3764 ---------------------------------------------------------------------- Apologies for previous patch removing required event_assign. I tested the wrong binary. All appears working with the code at https://github.com/alan-hicks/dbmail Still unable to reproduce LDAP gone away. ---------------------------------------------------------------------- (0003765) PeterS (reporter) - 14-Oct-16 19:33 http://dbmail.org/mantis/view.php?id=1081#c3765 ---------------------------------------------------------------------- Ok; yes, I did experience some issues with that latest patch, so I will give this newer one a try. Thanks. ---------------------------------------------------------------------- (0003766) PeterS (reporter) - 14-Oct-16 20:50 http://dbmail.org/mantis/view.php?id=1081#c3766 ---------------------------------------------------------------------- With your patch from today, it looks as though at some point it gives up on the LDAPS server that is configured and while it claims to be "reconnecting", it in fact is not sending any requests to the LDAPS server... It is logging entries like the following: Oct 14 13:53:37 swlx143.swmed.e dbmail-imapd[1552]: [0x25a0630] Warning:[auth] authldap_search(+293): LDAP gone away: Can't contact LDAP server. Trying again(519/3600). Oct 14 13:53:37 swlx143.swmed.e dbmail-imapd[1552]: [0x25a04a0] Warning:[auth] authldap_search(+293): LDAP gone away: Can't contact LDAP server. Trying again(429/3600). Oct 14 13:53:38 swlx143.swmed.e dbmail-imapd[1552]: [0x25a0630] Warning:[auth] authldap_search(+293): LDAP gone away: Can't contact LDAP server. Trying again(520/3600). Oct 14 13:53:38 swlx143.swmed.e dbmail-imapd[1552]: [0x25a04a0] Warning:[auth] authldap_search(+293): LDAP gone away: Can't contact LDAP server. Trying again(430/3600). A TCPDUMP of all traffic for the destination of the LDAPS server that is configured reveals NO TRAFFIC being generated. Am I misunderstanding what "Trying again" means perhaps? Issue History Date Modified Username Field Change ====================================================================== 10-Aug-16 23:27 PeterS New Issue 10-Aug-16 23:27 PeterS File Added: dbmail.err 11-Aug-16 19:57 PeterS Note Added: 0003740 07-Sep-16 23:37 PeterS Note Added: 0003745 08-Sep-16 19:01 alan Note Added: 0003746 15-Sep-16 17:31 PeterS Note Added: 0003749 28-Sep-16 20:24 alan Note Added: 0003751 30-Sep-16 16:58 alan Note Added: 0003752 30-Sep-16 17:01 alan File Added: dbmail_src_modules_authldap.c.diff 30-Sep-16 18:15 alan Issue Monitored: alan 30-Sep-16 18:17 alan File Added: dbmail_src_modules_authldap.c-20160930.diff 30-Sep-16 23:37 PeterS Note Added: 0003753 30-Sep-16 23:37 PeterS File Added: dbmail.20160930-1642.err 01-Oct-16 00:18 PeterS Note Added: 0003754 01-Oct-16 00:18 PeterS File Added: dbmail.20160930-1721.err 01-Oct-16 15:29 alan File Added: src_modules_authldap.c-20161001.diff 01-Oct-16 15:36 alan Note Added: 0003755 03-Oct-16 16:54 PeterS Note Added: 0003756 03-Oct-16 16:56 PeterS File Added: dbmail.20161003-0956.err.xz 03-Oct-16 17:52 PeterS Note Added: 0003757 03-Oct-16 17:52 PeterS File Added: dbmail.20161003-1055.err.xz 03-Oct-16 19:57 PeterS Note Edited: 0003757 04-Oct-16 17:54 alan Note Added: 0003758 04-Oct-16 17:54 alan File Added: src_modules_authldap.c-20161004.diff 04-Oct-16 20:47 PeterS Note Added: 0003759 04-Oct-16 20:48 PeterS File Added: dbmail.20161004-1348.err.xz 04-Oct-16 21:47 PeterS Note Edited: 0003759 06-Oct-16 15:57 alan Note Added: 0003760 06-Oct-16 15:57 alan File Added: src_modules_authldap.c-20161006.diff 06-Oct-16 19:45 PeterS Note Added: 0003761 06-Oct-16 19:45 PeterS File Added: dbmail.20161006-1236.err.xz 06-Oct-16 19:46 PeterS File Added: dbmail.20161006-1236.ldaps_tcpdump.txt.xz 06-Oct-16 19:46 PeterS File Added: dbmail.20161006-1236.extra.txt.xz 12-Oct-16 19:20 alan File Added: src_modules_authldap.c-20161012.diff 12-Oct-16 20:13 alan Note Added: 0003762 13-Oct-16 17:14 alan File Added: src_server.c-20161013.diff 13-Oct-16 17:16 alan Note Added: 0003763 14-Oct-16 19:01 alan Note Added: 0003764 14-Oct-16 19:33 PeterS Note Added: 0003765 14-Oct-16 20:50 PeterS Note Added: 0003766 ====================================================================== _______________________________________________ Dbmail-dev mailing list Dbmail-dev@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev