Hello, libev team, sorry to bother you.
I am a small development team here, about 20 users connect to the ocserv server
through cisco anyconnect. I don't know how to manually reproduce this problem,
but in my scenario, the ocserv-main process will exit with segment-fault. All
users will drop from the vpn at the same time. Every day, it will fail 1-2
times. It is not a fault at startup. Failure after running for a while.
From the results of coredump, I think this is a problem with libev.
As you can see from the dmesg-T log, this is the list of faults in the most
recent week.
This problem makes us very depressed and will cause work to be interrupted.
This is the latest fault today.
[admin@vpn ~]$ uname -a
Linux vpn.kofo.io 3.10.0-957.5.1.el7.x86_64 #1 SMP Fri Feb 1 14:54:57 UTC 2019
x86_64 x86_64 x86_64 GNU/Linux
[admin@vpn ~]$ cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
[admin@vpn ~]$ rpm -qa | grep ocserv
ocserv-0.12.3-1.el7.x86_64
ocserv-debuginfo-0.12.3-1.el7.x86_64
I tested two versions of libev
==========with libev-4.15-7.el7.x86_64.rpm??==========
[admin@vpn ~]$ gdb /usr/sbin/ocserv
/tmp/core-ocserv-main-sig11-user0-group0-pid1778-time1561007979
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/ocserv...Reading symbols from
/usr/lib/debug/usr/sbin/ocserv.debug...done.
done.
warning: core file may not match specified executable file.
[New LWP 1778]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `ocserv-main
'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000000000 in ?? ()
(gdb) where
#0 0x0000000000000000 in ?? ()
#1 0x00007f305fc553d5 in ev_invoke_pending (loop=0x7f305fe5ea40
<default_loop_struct>) at ev.c:3322
#2 0x00007f305fc585b5 in ev_run (loop=0x7f305fe5ea40 <default_loop_struct>,
flags=flags@entry=0) at ev.c:3726
#3 0x000055d4d269d7da in main (argc=<optimized out>, argv=<optimized out>) at
main.c:1440
(gdb) l
1222 static void syserr_cb (const char *msg)
1223 {
1224 main_server_st *s = ev_userdata(loop);
1225
1226 mslog(s, NULL, LOG_ERR, "libev fatal error: %s", msg);
1227 abort();
1228 }
1229
1230 int main(int argc, char** argv)
1231 {
(gdb) quit
[admin@vpn ~]$ dmesg -T | tail
[Mon Jun 17 20:07:38 2019] traps: ocserv-main[4398] general protection
ip:7fb9b653e35c sp:7ffcfdb6ce50 error:0 in libev.so.4.0.0[7fb9b6536000+d000]
[Mon Jun 17 20:12:10 2019] traps: ocserv-main[4708] general protection
ip:7fb9b653e35c sp:7ffcfdb6ce50 error:0 in libev.so.4.0.0[7fb9b6536000+d000]
[Mon Jun 17 20:12:28 2019] traps: ocserv-main[4743] general protection
ip:7fb9b653e35c sp:7ffcfdb6ce50 error:0 in libev.so.4.0.0[7fb9b6536000+d000]
[Mon Jun 17 20:12:56 2019] traps: ocserv-main[4767] general protection
ip:7fb9b653e35c sp:7ffcfdb6ce50 error:0 in libev.so.4.0.0[7fb9b6536000+d000]
[Mon Jun 17 20:13:34 2019] traps: ocserv-main[4819] general protection
ip:7fb9b653e35c sp:7ffcfdb6ce50 error:0 in libev.so.4.0.0[7fb9b6536000+d000]
[Mon Jun 17 20:16:38 2019] ocserv-main[14426]: segfault at 55686ec0add8 ip
00007fb9b653aabc sp 00007ffcfdb6cec0 error 6 in
libev.so.4.0.0[7fb9b6536000+d000]
[Tue Jun 18 03:30:28 2019] traps: ocserv-main[5392] general protection
ip:7f5e1e8bcc48 sp:7ffc2e520af0 error:0 in libev.so.4.0.0[7f5e1e8b8000+d000]
[Tue Jun 18 12:47:01 2019] ocserv-main[6841]: segfault at 0 ip (null)
sp 00007fffd1ace668 error 14 in ocserv[558a01f44000+5c000]
[Tue Jun 18 20:20:22 2019] traps: ocserv-main[25818] general protection
ip:7f49ca78cc48 sp:7ffe69d8fc50 error:0 in libev.so.4.0.0[7f49ca788000+d000]
[Thu Jun 20 13:18:57 2019] ocserv-main[1778]: segfault at 0 ip (null)
sp 00007ffe0e0a4858 error 14 in ocserv[55d4d2691000+5c000]
==========with libev 4.25??Manual compilation and installation==========
dmesg -T??
[Tue Jun 18 20:20:21 2019] traps: ocserv-main[25818] general protection
ip:7f49ca78cc48 sp:7ffe69d8fc50 error:0 in libev.so.4.0.0[7f49ca788000+d000]
[admin@vpn tmp]$ sudo file
/tmp/core-ocserv-main-sig11-user0-group0-pid25818-time1560860462
/tmp/core-ocserv-main-sig11-user0-group0-pid25818-time1560860462: ELF 64-bit
LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'ocserv-main', real
uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn:
'/usr/sbin/ocserv', platform: 'x86_64'
Unix Time??1560860462 = 2019/6/18 20:21:2 CST
[admin@vpn tmp]$ sudo chmod +r
core-ocserv-main-sig11-user0-group0-pid25818-time1560860462
[admin@vpn ~]$ gdb /usr/sbin/ocserv
/tmp/core-ocserv-main-sig11-user0-group0-pid25818-time1560860462
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/ocserv...Reading symbols from
/usr/lib/debug/usr/sbin/ocserv.debug...done.
done.
warning: core file may not match specified executable file.
[New LWP 25818]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `ocserv-main
'.
Program terminated with signal 11, Segmentation fault.
#0 child_reap (status=0, pid=1619, chain=1619, loop=0x7f49ca995a40
<default_loop_struct>) at ev.c:2658
2658 if ((w->pid == pid || !w->pid)
(gdb) where
#0 child_reap (status=0, pid=1619, chain=1619, loop=0x7f49ca995a40
<default_loop_struct>) at ev.c:2658
#1 childcb (loop=0x7f49ca995a40 <default_loop_struct>, sw=<optimized out>,
revents=<optimized out>) at ev.c:2690
#2 0x00007f49ca78c3d5 in ev_invoke_pending (loop=0x7f49ca995a40
<default_loop_struct>) at ev.c:3322
#3 0x00007f49ca78f5b5 in ev_run (loop=0x7f49ca995a40 <default_loop_struct>,
flags=flags@entry=0) at ev.c:3726
#4 0x0000559f444867da in main (argc=<optimized out>, argv=<optimized out>) at
main.c:1440
(gdb) l
2653 ev_child *w;
2654 int traced = WIFSTOPPED (status) || WIFCONTINUED (status);
2655
2656 for (w = (ev_child *)childs [chain & ((EV_PID_HASHSIZE) - 1)]; w; w =
(ev_child *)((WL)w)->next)
2657 {
2658 if ((w->pid == pid || !w->pid)
2659 && (!traced || (w->flags & 1)))
2660 {
2661 ev_set_priority (w, EV_MAXPRI); /* need to do it *now*, this
*must* be the same prio as the signal watcher itself */
2662 w->rpid = pid;
(gdb) p w
$2 = (ev_child *) 0x2d3832312d534541
(gdb) p w->pid
Cannot access memory at address 0x2d3832312d53456d
p w->pid is a wild pointer.
_______________________________________________
libev mailing list
libev@lists.schmorp.de
http://lists.schmorp.de/mailman/listinfo/libev