A NOTE has been added to this issue. 
====================================================================== 
http://dbmail.org/mantis/view.php?id=1081 
====================================================================== 
Reported By:                PeterS
Assigned To:                
====================================================================== 
Project:                    DBMail
Issue ID:                   1081
Category:                   Authentication layer
Reproducibility:            always
Severity:                   major
Priority:                   normal
Status:                     new
target:                      
====================================================================== 
Date Submitted:             10-Aug-16 23:27 CEST
Last Modified:              20-Oct-16 23:46 CEST
====================================================================== 
Summary:                    DBmail ABEND'ing upon LDAP access error.
Description: 
I've been trying to find a version of DBmail 3.X that does not ABEND while
it is accessing LDAP.  I have not been able to determine the sequence of
events that causes the ABEND, however it happens to me many many times a
day.  I've written a script around the execution of DBmail to ensure that
it restarts when it ABENDs.
====================================================================== 

---------------------------------------------------------------------- 
 (0003740) PeterS (reporter) - 11-Aug-16 19:57
 http://dbmail.org/mantis/view.php?id=1081#c3740 
---------------------------------------------------------------------- 
Persuant to the "Bug posting etiquette", yesterday I pulled a clean,
newest, version of DBmail from GIT, compiled it, installed it, cleared the
dbmail.err log, and started it.  Since then I have received 14 ABENDs.

[compiler@swlx143 dbmail-git]$ date ; git status -u no
Thu Aug 11 13:02:11 CDT 2016
# On branch master
nothing to commit (working directory clean)
[compiler@swlx143 dbmail-git]$ 

---------------------------------------------------------------------- 
 (0003745) PeterS (reporter) - 07-Sep-16 23:37
 http://dbmail.org/mantis/view.php?id=1081#c3745 
---------------------------------------------------------------------- 
Today I migrated my database over to new hardware and am receiving the same
error.

dbmail-imapd: sasl.c:75: ldap_sasl_bind: Assertion `(
(ld)->ldc->ldc_options.ldo_valid == 0x2 )' failed.

Red Hat RHEL7
dbmail-auth-ldap-3.2.3-1.el7.x86_64
dbmail-3.2.3-1.el7.x86_64
postgresql-9.2.15-1.el7_2.x86_64
postgresql-server-9.2.15-1.el7_2.x86_64 

---------------------------------------------------------------------- 
 (0003746) alan (reporter) - 08-Sep-16 19:01
 http://dbmail.org/mantis/view.php?id=1081#c3746 
---------------------------------------------------------------------- 
This error appears to be ldap connection related. Error reports elsewhere
suggest either something in /etc/openldap/ldap.conf or library
compatibility following updates to openssl. Are there any certificates
missing?
Are you able to compile dbmail locally, this may address library issues. 

---------------------------------------------------------------------- 
 (0003749) PeterS (reporter) - 15-Sep-16 17:31
 http://dbmail.org/mantis/view.php?id=1081#c3749 
---------------------------------------------------------------------- 
Alan, I have already tried the latest build(s) of DBmail to no avail.  I've
disabled all certificate handling in OPENLDAP using both:

TLS_REQCERT never

and also:

TLS_REQCERT allow

with no change in results.  We have multiple Microsoft AD LDAP servers
here, with varying certificates that aren't really verifiable, so I have
always disabled tracking or requiring valid certificates.  We have other
Linux systems using LDAP/OPENLDAP without any issues using this same
config. 

---------------------------------------------------------------------- 
 (0003751) alan (reporter) - 28-Sep-16 20:24
 http://dbmail.org/mantis/view.php?id=1081#c3751 
---------------------------------------------------------------------- 
Confirm I'm able to reproduce the issue, think I've tracked down the parts
of the code causing it. Will test asap. 

---------------------------------------------------------------------- 
 (0003752) alan (reporter) - 30-Sep-16 16:58
 http://dbmail.org/mantis/view.php?id=1081#c3752 
---------------------------------------------------------------------- 
This issues appears to be fixed using the attached patch, please test. 

---------------------------------------------------------------------- 
 (0003753) PeterS (reporter) - 30-Sep-16 23:37
 http://dbmail.org/mantis/view.php?id=1081#c3753 
---------------------------------------------------------------------- 
I've patched the Red Hat RHEL 7 (EPEL 7) version of DBmail
(dbmail-3.2.3-1.el7.x86_64, dbmail-auth-ldap-3.2.3-1.el7.x86_64, and
dbmail-debuginfo-3.2.3-1.el7.x86_64) with
dbmail_src_modules_authldap.c-20160930.diff and receive the attached error,
dbmail.20160930-1642.err .  The service still ABEND's. 

---------------------------------------------------------------------- 
 (0003754) PeterS (reporter) - 01-Oct-16 00:18
 http://dbmail.org/mantis/view.php?id=1081#c3754 
---------------------------------------------------------------------- 
I've patched a GIT clone (git clone git://git.dbmail.eu/paul/dbmail) from
today with dbmail_src_modules_authldap.c-20160930.diff and receive the
attached error, dbmail.20160930-1721.err .  The service still ABEND's. 

---------------------------------------------------------------------- 
 (0003755) alan (reporter) - 01-Oct-16 15:36
 http://dbmail.org/mantis/view.php?id=1081#c3755 
---------------------------------------------------------------------- 
Ok, latest patch src_modules_authldap.c-20161001.diff increases the time to
check for server gone away to 2 minutes.

There is an edge case where after binding, the connection instantly goes
away. Also the daemon exits if the ldap server doesn't reappear after two
minutes, similar to not starting if not available.

There should be warnings such as the following, then recovers when the
server returns.
Oct 01 14:06:38 lully.p-o.co.uk dbmail-imapd[96422]: [0x805c0a800]
Warning:[auth] ldap_con_get(+142): LDAP gone away: Can't contact LDAP
server. Trying to reconnect(1/5).
Oct 01 14:06:39 lully.p-o.co.uk dbmail-imapd[96422]: [0x805c0a800]
Warning:[auth] ldap_con_get(+142): LDAP gone away: Can't contact LDAP
server. Trying to reconnect(2/5).
Oct 01 14:06:49 lully.p-o.co.uk dbmail-imapd[96422]: [0x805c0a800]
Warning:[auth] ldap_con_get(+142): LDAP gone away: Can't contact LDAP
server. Trying to reconnect(119/5).

Does this get closer to addressing your issue? 

---------------------------------------------------------------------- 
 (0003756) PeterS (reporter) - 03-Oct-16 16:54
 http://dbmail.org/mantis/view.php?id=1081#c3756 
---------------------------------------------------------------------- 
I've patched the Red Hat RHEL 7 (EPEL 7) version of DBmail
(dbmail-3.2.3-1.el7.x86_64, dbmail-auth-ldap-3.2.3-1.el7.x86_64, and
dbmail-debuginfo-3.2.3-1.el7.x86_64) with
src_modules_authldap.c-20161001.diff and receive the attached error,
dbmail.20161003-0956.err .  This time I've cut the last parts of the full
dbmail.err file which consists of the last threads 0x7f86e84e8000 and
0x7f86e81d1400 .  The service still ABEND's.

I'll now try with today's GIT version and the patch. 

---------------------------------------------------------------------- 
 (0003757) PeterS (reporter) - 03-Oct-16 19:57
 http://dbmail.org/mantis/view.php?id=1081#c3757 
---------------------------------------------------------------------- 
I've patched a GIT clone (git clone git://git.dbmail.eu/paul/dbmail) from
today with src_modules_authldap.c-20161001.diff and receive the attached
error, dbmail.20161003-1055.err (dbmail.20161003-1055.err.xz) . This time
I've cut the last parts of the full dbmail.err file which consists of the
last threads 0x1466450 and 0xde7400.  The service still ABEND's.

 

---------------------------------------------------------------------- 
 (0003758) alan (reporter) - 04-Oct-16 17:54
 http://dbmail.org/mantis/view.php?id=1081#c3758 
---------------------------------------------------------------------- 
Patch src_modules_authldap.c-20161004.diff adds slightly more debugging
information, plus binding and search retries.

The error appears to be transient and localised in authldap_search, this
patch should address the issue.

fyi there is a copy at https://github.com/alan-hicks/dbmail 

---------------------------------------------------------------------- 
 (0003759) PeterS (reporter) - 04-Oct-16 21:47
 http://dbmail.org/mantis/view.php?id=1081#c3759 
---------------------------------------------------------------------- 
I pulled down a clone of your https://github.com/alan-hicks/dbmail and had
it running.  This version does not ABEND but rather stops communicating
(which caused my client to Segfault in the middle of doing IMAPS things).

Please see the attached dbmail.20161004-1348.err
(dbmail.20161004-1348.err.xz) for the full error listing.

Correction: it did ABEND when I restarted my IMAPS client and attempted to
connect to the DBmail instance.  The same "dbmail-imapd: dm_config.c:134:
config_get_value_once: Assertion `config_dict' failed." error as before is
the last thing logged.

 

---------------------------------------------------------------------- 
 (0003760) alan (reporter) - 06-Oct-16 15:57
 http://dbmail.org/mantis/view.php?id=1081#c3760 
---------------------------------------------------------------------- 
I'm struggling to find what causes dbmail to be unable to contact the
server.
Patch src_modules_authldap.c-20161006.diff bumps the search timeout to 60
seconds.
The error is that the ldap server can't be contacted. Although there is
only a single connection, nothing suggests it's blocked, out of memory or
any of the normal errors. As there are relatively few ldap server queries
compared to sql, pooling probably wouldn't help. I'm assuming the ldap
server/active directory is fine and that it's just dbmail that's unable to
connect.
Not sure what would have caused your client to segfault, the retries would
have paused sending anything back to the client and at most would have
dropped the connection.
Unable to see any "dm_config.c:134: config_get_value_once: Assertion"
errors in the log, can you upload the part of the log leading up to that
assertion please as I'm not sure where getting config values would be
called after startup. 

---------------------------------------------------------------------- 
 (0003761) PeterS (reporter) - 06-Oct-16 19:45
 http://dbmail.org/mantis/view.php?id=1081#c3761 
---------------------------------------------------------------------- 
Alan, thank you for continuing to assist with this.  At some point I may
write a script that will allow me to sanitize my logs so that I can just
post the entire thing to you.

I am attaching even more information for you.

It looks like my dbmail is not disconnecting sessions that it makes with
our LDAP server, and apparently finally ABEND's after it attempts to re-use
a connection to LDAP that is over an hour old.

Please see the attached logs -- dbmail.20161006-1236.err
(dbmail.20161006-1236.err.xz), dbmail.20161006-1236.ldaps_tcpdump.txt
(dbmail.20161006-1236.ldaps_tcpdump.txt.xz), and
dbmail.20161006-1236.extra.txt (dbmail.20161006-1236.extra.txt.xz) .

dbmail.20161006-1236.err contains the very beginning of the log and a large
portion of the end of the log.  The "dm_config.c:134:" error is there. 
dbmail.20161006-1236.ldaps_tcpdump.txt is the text framing output of the
*full* LDAPS conversation with my LDAP server (I captured all traffic to
and from its IP address.)  dbmail.20161006-1236.extra.txt contains all of
the "ldap_con_get"-s that exist in the full dbmail.err log.  There is a
drop-off from 11:14:53 to 12:15:05 at which point DBmail ABENDs. 

---------------------------------------------------------------------- 
 (0003762) alan (reporter) - 12-Oct-16 20:13
 http://dbmail.org/mantis/view.php?id=1081#c3762 
---------------------------------------------------------------------- 
I'm happy to continue supporting dbmail, it's used here so makes sense to
share and keep it up to date.

It's normal to keep ldap connections open if the threads are still extant,
it's similar for the database.

As there are no obvious reasons why the search suddenly stops without an
error, it may be due to a long running thread hogging a resource, I'm aware
that occasionally a long folder listing/update appears to suspend some
clients.

Patch src_modules_authldap.c-20161012.diff updates some deprecated
functions and takes the timeout value from query_timeout in dbmail.conf
which from memory defaults to 300 (5 minutes).

If this fixes the issue great, otherwise updating the code to be more
resource friendly looks as though it will take much longer.  As
dbmail.20161006-1236.err.xz timed out after 60 seconds it may not achieve
much, though you might try updating query_timeout to 3600 (an hour) to test
if whatever is causing the block is released. Hopefully whatever precedes
any resumption will help pinpoint the blocking. 

---------------------------------------------------------------------- 
 (0003763) alan (reporter) - 13-Oct-16 17:16
 http://dbmail.org/mantis/view.php?id=1081#c3763 
---------------------------------------------------------------------- 
patch src_server.c-20161013.diff removes some redundant event and profiling
code. 

---------------------------------------------------------------------- 
 (0003764) alan (reporter) - 14-Oct-16 19:01
 http://dbmail.org/mantis/view.php?id=1081#c3764 
---------------------------------------------------------------------- 
Apologies for previous patch removing required event_assign. I tested the
wrong binary.
All appears working with the code at https://github.com/alan-hicks/dbmail
Still unable to reproduce LDAP gone away. 

---------------------------------------------------------------------- 
 (0003765) PeterS (reporter) - 14-Oct-16 19:33
 http://dbmail.org/mantis/view.php?id=1081#c3765 
---------------------------------------------------------------------- 
Ok; yes, I did experience some issues with that latest patch, so I will
give this newer one a try.  Thanks. 

---------------------------------------------------------------------- 
 (0003766) PeterS (reporter) - 14-Oct-16 20:50
 http://dbmail.org/mantis/view.php?id=1081#c3766 
---------------------------------------------------------------------- 
With your patch from today, it looks as though at some point it gives up on
the LDAPS server that is configured and while it claims to be
"reconnecting", it in fact is not sending any requests to the LDAPS
server...

It is logging entries like the following:

Oct 14 13:53:37 swlx143.swmed.e dbmail-imapd[1552]: [0x25a0630]
Warning:[auth] authldap_search(+293): LDAP gone away: Can't contact LDAP
server. Trying again(519/3600).
Oct 14 13:53:37 swlx143.swmed.e dbmail-imapd[1552]: [0x25a04a0]
Warning:[auth] authldap_search(+293): LDAP gone away: Can't contact LDAP
server. Trying again(429/3600).
Oct 14 13:53:38 swlx143.swmed.e dbmail-imapd[1552]: [0x25a0630]
Warning:[auth] authldap_search(+293): LDAP gone away: Can't contact LDAP
server. Trying again(520/3600).
Oct 14 13:53:38 swlx143.swmed.e dbmail-imapd[1552]: [0x25a04a0]
Warning:[auth] authldap_search(+293): LDAP gone away: Can't contact LDAP
server. Trying again(430/3600).

A TCPDUMP of all traffic for the destination of the LDAPS server that is
configured reveals NO TRAFFIC being generated.  Am I misunderstanding what
"Trying again" means perhaps? 

---------------------------------------------------------------------- 
 (0003767) alan (reporter) - 14-Oct-16 22:40
 http://dbmail.org/mantis/view.php?id=1081#c3767 
---------------------------------------------------------------------- 
That part of the code is trying to send a request, something in dbmail is
blocking it. Are existing sessions still OK, is it possible to start a new
session? Any clues to what else might be waiting or stalled? 

---------------------------------------------------------------------- 
 (0003768) alan (reporter) - 17-Oct-16 10:34
 http://dbmail.org/mantis/view.php?id=1081#c3768 
---------------------------------------------------------------------- 
Default timeout of 4000 appears excessive, I'm successfully running
production with 300 (5 mins), perhaps 60 might be useful for testing and
may illuminate a stuck thread. 

---------------------------------------------------------------------- 
 (0003769) PeterS (reporter) - 17-Oct-16 21:07
 http://dbmail.org/mantis/view.php?id=1081#c3769 
---------------------------------------------------------------------- 
Here is the latest.

dbmail.20161017-1031.err.xz (dbmail.20161017-1031.err) and
dbmail.20161017-1031.extra.txt (dbmail.20161017-1031.extra.txt.xz)

Updated configuration for your suggested timeout, "query_timeout = 60". 
Still ABENDs, with the last line being:

"dbmail-imapd: dm_config.c:134: config_get_value_once: Assertion
`config_dict' failed." 

---------------------------------------------------------------------- 
 (0003770) alan (reporter) - 18-Oct-16 15:05
 http://dbmail.org/mantis/view.php?id=1081#c3770 
---------------------------------------------------------------------- 
Just pushed a commit so the connection should show an error but not ABEND.
The imap connection should fail and require re-authentication, depending on
how the client connects it may need to be restarted. All should be ok if
the other threads continue and either the affected thread dies with the
connection or fully recovers after 'unrecoverable error while talking to
ldap server'. 

---------------------------------------------------------------------- 
 (0003771) PeterS (reporter) - 18-Oct-16 19:52
 http://dbmail.org/mantis/view.php?id=1081#c3771 
---------------------------------------------------------------------- 
The update is a bit better as it does not ABEND any longer.  However once
it loses its connection to LDAP it can not recover and infinitely attempts
to connect.

See sanitized log, dbmail.20161018-1122.err (dbmail.20161018-1122.err.xz).

I have watched for any traffic outbound from the system to LDAP or LDAPS
and it is not actually attempting or sending anything, so the connection
attempts are bogus. 

---------------------------------------------------------------------- 
 (0003772) alan (reporter) - 19-Oct-16 18:58
 http://dbmail.org/mantis/view.php?id=1081#c3772 
---------------------------------------------------------------------- 
LDAP call updated to remove a deprecated function.

It appears timeout's don't work for synchronous calls:
http://www.openldap.org/lists/openldap-technical/201311/msg00266.html 

---------------------------------------------------------------------- 
 (0003773) PeterS (reporter) - 20-Oct-16 23:44
 http://dbmail.org/mantis/view.php?id=1081#c3773 
---------------------------------------------------------------------- 
It still is not ABEND'ing, which is great.

However it still appears to be attempting to re-use a long-dead LDAPS
connection (0x1049d90) provided in this case.

Please see attached sanitized log parts,
dbmail.20161020-1644.imap_session_handle_auth.err
(dbmail.20161020-1644.imap_session_handle_auth.err.xz),
dbmail.20161020-1644.ldap_tcpdump.txt
(dbmail.20161020-1644.ldap_tcpdump.txt.xz), and
dbmail.20161020-1644.thread_0x1049d90.err
(dbmail.20161020-1644.thread_0x1049d90.err.xz).

The provided text-only TCPDUMP/Wireshark output is actually still running
right now, and shows no traffic at all to the configured LDAP/LDAPS server
since 2016-10-20 11:42:05 .

I focused on thread ID 0x1049d90 since that seems to be the one that it
chose to re-use after having not used it in over 1 hour.  It blindly
retries and retries infinitely. 

---------------------------------------------------------------------- 
 (0003774) PeterS (reporter) - 20-Oct-16 23:46
 http://dbmail.org/mantis/view.php?id=1081#c3774 
---------------------------------------------------------------------- 
It actually isn't infinite, but it does NOT re-establish a connection with
the LDAP/LDAPS server, and issues a "Notice:[imap]
dbmail_imap_session_handle_auth(+1342): [0x110e860] login rejected: user"
message to my IMAPS client. 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
10-Aug-16 23:27  PeterS         New Issue                                    
10-Aug-16 23:27  PeterS         File Added: dbmail.err                       
11-Aug-16 19:57  PeterS         Note Added: 0003740                          
07-Sep-16 23:37  PeterS         Note Added: 0003745                          
08-Sep-16 19:01  alan           Note Added: 0003746                          
15-Sep-16 17:31  PeterS         Note Added: 0003749                          
28-Sep-16 20:24  alan           Note Added: 0003751                          
30-Sep-16 16:58  alan           Note Added: 0003752                          
30-Sep-16 17:01  alan           File Added: dbmail_src_modules_authldap.c.diff  
                 
30-Sep-16 18:15  alan           Issue Monitored: alan                        
30-Sep-16 18:17  alan           File Added:
dbmail_src_modules_authldap.c-20160930.diff                    
30-Sep-16 23:37  PeterS         Note Added: 0003753                          
30-Sep-16 23:37  PeterS         File Added: dbmail.20160930-1642.err            
       
01-Oct-16 00:18  PeterS         Note Added: 0003754                          
01-Oct-16 00:18  PeterS         File Added: dbmail.20160930-1721.err            
       
01-Oct-16 15:29  alan           File Added: src_modules_authldap.c-20161001.diff
                   
01-Oct-16 15:36  alan           Note Added: 0003755                          
03-Oct-16 16:54  PeterS         Note Added: 0003756                          
03-Oct-16 16:56  PeterS         File Added: dbmail.20161003-0956.err.xz         
          
03-Oct-16 17:52  PeterS         Note Added: 0003757                          
03-Oct-16 17:52  PeterS         File Added: dbmail.20161003-1055.err.xz         
          
03-Oct-16 19:57  PeterS         Note Edited: 0003757                         
04-Oct-16 17:54  alan           Note Added: 0003758                          
04-Oct-16 17:54  alan           File Added: src_modules_authldap.c-20161004.diff
                   
04-Oct-16 20:47  PeterS         Note Added: 0003759                          
04-Oct-16 20:48  PeterS         File Added: dbmail.20161004-1348.err.xz         
          
04-Oct-16 21:47  PeterS         Note Edited: 0003759                         
06-Oct-16 15:57  alan           Note Added: 0003760                          
06-Oct-16 15:57  alan           File Added: src_modules_authldap.c-20161006.diff
                   
06-Oct-16 19:45  PeterS         Note Added: 0003761                          
06-Oct-16 19:45  PeterS         File Added: dbmail.20161006-1236.err.xz         
          
06-Oct-16 19:46  PeterS         File Added:
dbmail.20161006-1236.ldaps_tcpdump.txt.xz                    
06-Oct-16 19:46  PeterS         File Added: dbmail.20161006-1236.extra.txt.xz   
                
12-Oct-16 19:20  alan           File Added: src_modules_authldap.c-20161012.diff
                   
12-Oct-16 20:13  alan           Note Added: 0003762                          
13-Oct-16 17:14  alan           File Added: src_server.c-20161013.diff          
         
13-Oct-16 17:16  alan           Note Added: 0003763                          
14-Oct-16 19:01  alan           Note Added: 0003764                          
14-Oct-16 19:33  PeterS         Note Added: 0003765                          
14-Oct-16 20:50  PeterS         Note Added: 0003766                          
14-Oct-16 22:40  alan           Note Added: 0003767                          
17-Oct-16 10:34  alan           Note Added: 0003768                          
17-Oct-16 21:07  PeterS         Note Added: 0003769                          
17-Oct-16 21:08  PeterS         File Added: dbmail.20161017-1031.err.xz         
          
17-Oct-16 21:08  PeterS         File Added: dbmail.20161017-1031.extra.txt.xz   
                
18-Oct-16 15:05  alan           Note Added: 0003770                          
18-Oct-16 19:52  PeterS         Note Added: 0003771                          
18-Oct-16 19:56  PeterS         File Added: dbmail.20161018-1122.err.xz         
          
19-Oct-16 18:58  alan           Note Added: 0003772                          
20-Oct-16 23:44  PeterS         Note Added: 0003773                          
20-Oct-16 23:44  PeterS         File Added:
dbmail.20161020-1644.imap_session_handle_auth.err.xz                    
20-Oct-16 23:44  PeterS         File Added:
dbmail.20161020-1644.ldap_tcpdump.txt.xz                    
20-Oct-16 23:45  PeterS         File Added:
dbmail.20161020-1644.thread_0x1049d90.err.xz                    
20-Oct-16 23:46  PeterS         Note Added: 0003774                          
======================================================================

_______________________________________________
Dbmail-dev mailing list
Dbmail-dev@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev

Reply via email to