I have a standalone FreeIPA instance that is becoming unresponsive every few
hours. While in this state it will accept connections, but will not do anything
with them (i.e. if you connect an ldaps client to 636, you see
SYN->SYNACK->ACK->ClientHello, but a ServerHello is not returned). This system
is running FreeIPA 4.4.0 currently, but this also occurred on 4.2.x. Time is
synchronised correctly and this is a fairly new installation so all the PKI
expiry dates are well into the future.
It handles queries without complaint, right up until the point it doesn't.
Inspecting the process with strace shows it waiting on a socket:
getpeername(7, 0x7ffeb749af70, [112]) = -1 ENOTCONN (Transport endpoint
is not connected)
poll([{fd=50, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN},
{fd=8, events=POLLIN},
{fd=66, events=POLLIN}, {fd=80, events=POLLIN}, {fd=79, events=POLLIN},
{fd=78, events=POLLIN},
{fd=77, events=POLLIN}, {fd=76, events=POLLIN}, {fd=75, events=POLLIN},
{fd=73, events=POLLIN},
{fd=71, events=POLLIN}, {fd=70, events=POLLIN}, {fd=68, events=POLLIN}],
15, 250) = 0 (Timeout)
fd 7 is a constant:
ls -l /proc/2428/fd
lrwx------. 1 root root 64 Jan 6 17:16 7 -> socket:[18972]
I'm not sure if I'm understanding the meaning of the fd entry correctly, but I
believe this is the entry:
[root@ldap-001 log]# lsof -p 2428 | grep 18972
ns-slapd 2428 dirsrv 7u IPv6 18972 0t0 TCP
*:ldaps (LISTEN)
A backtrace from GDB follows at the end of this message - it shows the address
struct, which just contains the source address of the last connection to port
636 before DirSrv hangs.
The server is configured to use the FreeIPA dns service as its own resolver.
The DNS service is definitely still running, and resolves the query fine when
executed with dig.
There is nothing in the DirSrv logs that indicates an issue. The KDC logs
indicate a problem, but I i don't know if DirSrv is hanging because of the KDC,
or if the KDC is just reflecting that DirSrv is unresponsive.
Jan 06 21:53:29 ldap-001.domain krb5kdc[2702](info): AS_REQ (6 etypes {18
17 16 23 25 26}) 193.63.63.108: LOOKING_UP_CLIENT: host/ldap-001.domain@DOMAIN
for krbtgt/DOMAIN@DOMAIN, Server error
Jan 06 21:53:29 ldap-001.domain krb5kdc[2702](info): closing down fd 12
sssd reports an issue too, but that is almost certainly due to an unresponsive
DirSrv:
(Sat Jan 7 03:16:08 2017) [sssd[nss]] [sss_dp_get_reply] (0x0010): The
Data Provider returned an error
[org.freedesktop.sssd.Error.DataProvider.Offline]
I'm not really sure what to check next - all the individual components seem to
be working, but not together.
Any suggestions are appreciated.
Regards,
Adam Bishop
gpg: E75B 1F92 6407 DFDF 9F1C BF10 C993 2504 6609 D460
jisc.ac.uk
---
[root@ldap-001 log]# gdb -p 2428
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 2428
0x00007fc80bf4fdfd in poll () at ../sysdeps/unix/syscall-template.S:81
81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
Missing separate debuginfos, use: debuginfo-install
ipa-server-4.4.0-14.el7.centos.1.1.x86_64
(gdb) break getpeername
Breakpoint 1 at 0x7fc80bf5b4b0: file ../sysdeps/unix/syscall-template.S, line
81.
(gdb) cont
Continuing.
Breakpoint 1, getpeername () at ../sysdeps/unix/syscall-template.S:81
81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt full
#0 getpeername () at ../sysdeps/unix/syscall-template.S:81
No locals.
#1 0x00007fc80c888389 in pt_GetPeerName (fd=0x7fc810d92010,
addr=0x7ffeb749af70) at ../../../nspr/pr/src/pthreads/ptio.c:2795
rv = -1
addr_len = 112
#2 0x00007fc80d3fec23 in ssl_Poll (fd=0x7fc810b69260, how_flags=<optimized
out>, p_out_flags=0x7ffeb749b06c) at sslsock.c:2639
ss = 0x7fc810d94f30
new_flags = 1
addr = {raw = {family = 0, data = '\000' <repeats 13 times>}, inet =
{family = 0, port = 0, ip = 0, pad = "\000\000\000\000\000\000\000"}, ipv6 =
{family = 0, port = 0, flowinfo = 0,
ip = {_S6_un = {_S6_u8 = '\000' <repeats 15 times>, _S6_u16 = {0,
0, 0, 0, 0, 0, 0, 0}, _S6_u32 = {0, 0, 0, 0}, _S6_u64 = {0, 0}}}, scope_id =
0}, local = {family = 0,
path = '\000' <repeats 30 times>,
"\061\071\063.63.63.108\000\000\000`\327!\f\310\177\000\000\017\000\000\000\000\000\000\000p\260I\267\376\177\000\000\000\000\000\000\000\000\000\000\372",
'\000' <repeats 15 times>, "\372\000\000\000\000\000\000\000\215", <incomplete
sequence \343>}}
#3 0x00007fc80c887a45 in _pr_poll_with_poll (pds=0x7fc811256b40, npds=15,
timeout=timeout@entry=250) at ../../../nspr/pr/src/pthreads/ptio.c:3812
in_flags_read = 0
in_flags_write = 0
out_flags_read = 0
out_flags_write = 0
stack_syspoll = {{fd = 50, events = 1, revents = 0}, {fd = 6, events =
1, revents = 0}, {fd = 7, events = 1, revents = 0}, {fd = 8, events = 1,
revents = 0}, {fd = 66, events = 1,
revents = 0}, {fd = 80, events = 1, revents = 0}, {fd = 79, events
= 1, revents = 0}, {fd = 78, events = 1, revents = 0}, {fd = 77, events = 1,
revents = 0}, {fd = 76, events = 1,
revents = 0}, {fd = 75, events = 1, revents = 0}, {fd = 73, events
= 1, revents = 0}, {fd = 71, events = 1, revents = 0}, {fd = 70, events = 1,
revents = 0}, {fd = 68, events = 1,
revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 1219907217,
events = -32767, revents = -1}, {fd = 2, events = 32766, revents = 0}, {fd = 0,
events = 0, revents = 0}, {fd = 0,
events = 0, revents = 0}, {fd = 48, events = 91, revents = 0}, {fd
= -1219907216, events = 32766, revents = 0}, {fd = 0, events = 0, revents = 0},
{fd = 0, events = 0, revents = 0}, {
fd = 110, events = 119, revents = 0}, {fd = 0, events = 0, revents
= 0}, {fd = -1219907217, events = 32766, revents = 0}, {fd = 0, events = 0,
revents = 0}, {fd = -1219907201,
events = 32766, revents = 0}, {fd = 203544416, events = 32712,
revents = 0}, {fd = 124, events = 0, revents = 0}, {fd = 2560, events = 0,
revents = 0}, {fd = 1219907089,
events = -32767, revents = -1}, {fd = 3, events = 32712, revents =
0}, {fd = 0, events = 0, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd =
48, events = 91, revents = 0}, {
fd = -1219907088, events = 32766, revents = 0}, {fd = 0, events =
0, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 110, events = 119,
revents = 0}, {fd = 0, events = 0,
revents = 0}, {fd = -1219907089, events = 32766, revents = 0}, {fd
= 210264088, events = 32712, revents = 0}, {fd = 1, events = 0, revents = 0},
{fd = 287047696, events = 32712,
revents = 0}, {fd = -1, events = 0, revents = 0}, {fd = 0, events =
0, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 0, events = 0,
revents = 0}, {fd = 0, events = 0,
revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 0, events =
0, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 287320512, events =
32712, revents = 0}, {fd = 210265391,
events = 32712, revents = 0}, {fd = 0, events = 0, revents = 0},
{fd = 281542400, events = 32712, revents = 0}, {fd = 287320512, events = 32712,
revents = 0}, {fd = -133551240,
events = 32711, revents = 0}, {fd = 0, events = 0, revents = 0},
{fd = 246979857, events = 32712, revents = 0}, {fd = 5, events = 15, revents =
0}, {fd = -1219906728, events = 32766,
revents = 0}}
syspoll = 0x7ffeb749b070
index = 2
msecs = <optimized out>
ready = 0
start = <optimized out>
elapsed = <optimized out>
remaining = <optimized out>
#4 0x00007fc80c88a655 in PR_Poll (pds=<optimized out>, npds=<optimized out>,
timeout=timeout@entry=250) at ../../../nspr/pr/src/pthreads/ptio.c:4324
No locals.
#5 0x00007fc80eb8d789 in slapd_daemon (ports=ports@entry=0x7ffeb749b630) at
ldap/servers/slapd/daemon.c:1242
select_return = 0
prerr = <optimized out>
n_tcps = 0x7fc810b6db30
s_tcps = 0x7fc810b6da30
i_unix = 0x7fc810b6da10
fdesp = 0x0
num_poll = 15
pr_timeout = 250
time_thread_p = 0x7fc8111ff350
threads = <optimized out>
in_referral_mode = 0
tp = 0x0
tp_config = {init_flag = 1219906497, initial_threads = -32767,
max_threads = 9, stacksize = 0, event_queue_size = 2, work_queue_size = 0,
log_fct = 0x0,
log_start_fct = 0xffff800148b64ba1, log_close_fct = 0x7ffe0000000a,
malloc_fct = 0x2, calloc_fct = 0x0, realloc_fct = 0x5b00000032, free_fct =
0x7ffeb749b460}
#6 0x00007fc80eb7f253 in main (argc=5, argv=0x7ffeb749bc68) at
ldap/servers/slapd/main.c:1143
return_value = 0
slapdFrontendConfig = <optimized out>
ports_info = {n_port = 389, s_port = 636, n_listenaddr =
0x7fc810b6dc40, s_listenaddr = 0x7fc810b6dba0, n_socket = 0x7fc810b6db30,
i_listenaddr = 0x7fc810b6db50, i_port = 1,
i_socket = 0x7fc810b6da10, s_socket = 0x7fc810b6da30}
m = <optimized out>
notify = <optimized out>
Jisc is a registered charity (number 1149740) and a company limited by
guarantee which is registered in England under Company No. 5747339, VAT No. GB
197 0632 86. Jisc’s registered office is: One Castlepark, Tower Hill, Bristol,
BS2 0JA. T 0203 697 5800.
Jisc Services Limited is a wholly owned Jisc subsidiary and a company limited
by guarantee which is registered in England under company number 2881024, VAT
number GB 197 0632 86. The registered office is: One Castle Park, Tower Hill,
Bristol BS2 0JA. T 0203 697 5800.
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project