Hello all,

I have been running strongSwan for a while on some of my
networks and have been having a few stability issues.  I am
working on getting to root cause on a few of them and was
wondering if other people are having these issues:

1.)  DPD'd connections with dpaction=restart sometimes stop
and never come back.  The most common form of this is
the CHILD_SA going away and never being re-established.
I am working on getting better debug messages from charon and
figuring out if charon is missing kernel notifications or if
it just isn't establishing CHILD_SA's correctly.  This problem
seems to be worse over lower bandwidth connections.

Most of the time this bug takes a while to hit.  The first
time I saw this bug was after ~ 57 hours of a tunnel working.
The fastest I have hit this bug yet is ~ 19 hours.  Some
of my connections haven't hit this problem in the weeks they
have been up.

Some of these problems may be documented in:
https://lists.strongswan.org/pipermail/users/2009-June/003516.html

2.)  Sometimes connections will get into rekeying wars where
both ends start displaying:
deleting duplicate IKE_SA for peer 'w.x.y.z' due to uniqueness policy

which causes a rekey, which causes a duplicate, which causes a rekey,...
Note that only one end is configured to initiate the connection (auto=start,
dpdaction=restart.  The other end is (auto=add, dpdaction=clear)).
This bug can also take hours/days to hit.  This bug is pretty rare
as I have only hit it twice in all my testing.

3.)  Sometimes charon locks up.  I have seen this happen in many
different forms.  I hit this style of bug maybe once a week.  Unfortunately
this bug family is really nasty as I have to kill process, restart
processes, etc.
Here is one such trace:
gdb) thread apply all bt 15
Thread 6 (Thread 0x488304d0 (LWP 15785)):
#0  0x0ff8df60 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
      from /lib/libpthread.so.0
#1  0x0ffd7360 in ?? () from /usr/lib/libstrongswan.so.0
#2  0x10023884 in schedule (this=0x10067f28) at processing/scheduler.c:223
#3  0x100219d8 in execute (this=0x100680e0)
          at processing/jobs/callback_job.c:145
#4  0x100242e4 in process_jobs (this=0x1006a0b8) at processing/processor.c:123
#5  0x0ff87b34 in start_thread () from /lib/libpthread.so.0
#6  0x0fdf8b94 in clone () from /lib/libc.so.6
          Backtrace stopped: previous frame inner to this frame (corrupt stack?)

   Thread 5 (Thread 0x490304d0 (LWP 15786)):
#0  0x0ff92594 in recvfrom () from /lib/libpthread.so.0
#1  0x0f811720 in receive_events (this=<value optimized out>)
          at kernel_netlink_ipsec.c:748
#2  0x100219d8 in execute (this=0x1006e550)
              at processing/jobs/callback_job.c:145
#3  0x100242e4 in process_jobs (this=0x1006a0b8) at processing/processor.c:123
#4  0x0ff87b34 in start_thread () from /lib/libpthread.so.0
#5  0x0fdf8b94 in clone () from /lib/libc.so.6
              Backtrace stopped: previous frame inner to this frame
(corrupt stack?)

   Thread 4 (Thread 0x498304d0 (LWP 15787)):
#0  0x0ff92594 in recvfrom () from /lib/libpthread.so.0
#1  0x0f8175c0 in receive_events (this=0x1006e620) at kernel_netlink_net.c:498
#2  0x100219d8 in execute (this=0x1006e7a8)
          at processing/jobs/callback_job.c:145
#3  0x100242e4 in process_jobs (this=0x1006a0b8) at processing/processor.c:123
#4  0x0ff87b34 in start_thread () from /lib/libpthread.so.0
#5  0x0fdf8b94 in clone () from /lib/libc.so.6
          Backtrace stopped: previous frame inner to this frame (corrupt stack?)

   Thread 3 (Thread 0x4a0304d0 (LWP 15788)):
#0  0x0ff8d930 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x0ffd74ec in ?? () from /usr/lib/libstrongswan.so.0
#2  0x100213f4 in send_packets (this=0x10070888) at network/sender.c:97
#3  0x100219d8 in execute (this=0x100709d8)
          at processing/jobs/callback_job.c:145
#4  0x100242e4 in process_jobs (this=0x1006a0b8) at processing/processor.c:123
#5  0x0ff87b34 in start_thread () from /lib/libpthread.so.0
#6  0x0fdf8b94 in clone () from /lib/libc.so.6
          Backtrace stopped: previous frame inner to this frame (corrupt stack?)

   Thread 2 (Thread 0x4a8304d0 (LWP 15791)):
#0  0x0fdf0798 in select () from /lib/libc.so.6
#1  0x1004b5f8 in receiver (this=0x10069020, packet=0x4a82f93c)
          at network/socket-raw.c:148
#2  0x10020b7c in receive_packets (this=0x10070aa8) at network/receiver.c:266
#3  0x100219d8 in execute (this=0x10070b88)
              at processing/jobs/callback_job.c:145
#4  0x100242e4 in process_jobs (this=0x1006a0b8) at processing/processor.c:123
#5  0x0ff87b34 in start_thread () from /lib/libpthread.so.0
#6  0x0fdf8b94 in clone () from /lib/libc.so.6
              Backtrace stopped: previous frame inner to this frame
(corrupt stack?)

   Thread 1 (Thread 0x48022110 (LWP 15780)):
#0  0x0ff8d930 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x0ffd74b8 in ?? () from /usr/lib/libstrongswan.so.0
#2  0x10030a5c in flush (this=0xfff701c) at sa/ike_sa_manager.c:1552
#3  0x10011270 in destroy (this=0x10067970) at daemon.c:177
#4  0x100125fc in main (argc=<value optimized out>,
          argv=<value optimized out>) at daemon.c:790
#0  0x0ff8d930 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0

All of my ipsec.conf files look verify similar to:
version 2.0

config setup
   plutostart=no
   charonstart=yes
   strictcrlpolicy=no

conn host-host-1
   
ike=aes256-sha2_256-modp1536,aes256-sha1-modp1536,aes128-sha2_256-modp1536,aes128-sha1-modp1536,3des-sha2_256-modp1536,3des-sha1-modp1536
   
esp=aes256-sha2_256-modp1536,aes256-sha1-modp1536,aes128-sha2_256-modp1536,aes128-sha1-modp1536,3des-sha2_256-modp1536,3des-sha1-modp1536
   mobike=no
   pfs=yes
   pfsgroup=modp1536
   leftupdown=/usr/lib/ipsec/my_updown
   keyingtries=%forever
   dpdaction=restart
   dpddelay=60
   left=192.166.1.1
   right=192.166.1.2
   auto=start
   authby=secret
   keyexchange=ikev2

conn net-net-1-2-2
   leftsubnet=10.201.0.0/16
   rightsubnet=192.167.1.0/24
   also=host-host-1

conn net-net-1-2-1
   leftsubnet=10.201.0.0/16
   rightsubnet=192.168.2.0/24
   also=host-host-1

conn net-host-1-2
   leftsubnet=10.201.0.0/16
   also=host-host-1

conn host-net-1-2
   rightsubnet=192.167.1.0/24
   also=host-host-1

conn host-net-1-1
   rightsubnet=192.168.2.0/24
   also=host-host-1


I think strongSwan is great.  It is one of the easiest to configure
IKE daemons around, but I would like better stability.  I have no problem
working on the source code to try to solve some of these problems,
but I don't want to duplicate work or fight against known issues.  If
you know anything about any of these bugs, please let me know.

I am currently running 4.3.4 on Linux 2.6.29.3 (powerpc),

Thanks,

Barry
_______________________________________________
Users mailing list
Users@lists.strongswan.org
https://lists.strongswan.org/mailman/listinfo/users

Reply via email to