On Fri, Jul 18, 2014 at 09:36:23AM +0000, Stuart Henderson wrote:
> On 2014-07-17, Darryl Wisneski <[email protected]> wrote:
> > netstat -s -p udp |grep "dropped due to full socket"
> > 345197 dropped due to full socket buffers
>
> We're assuming this relates to openvpn packets but you can check which
> sockets have queues: (the sendq/recvq counters).
>
> netstat -an | grep -v ' 0 0'
Sometimes now, sockets will display a queue and netstat -s will show a counter
increase "dropped due to full socket buffers." We usually have a SIP
drop then. The two are not always coordinated, but mostly are.
udp 8846 0 xxx.100.173.xxx.1195 *.*
udp 1448 0 xxx.100.173.xxx.1195 *.*
udp 129 0 xxx.100.173.xxx.1194 *.*
udp 177 0 xxx.100.173.xxx.1195 *.*
udp 354 0 xxx.100.173.xxx.1195 *.*
udp 21115 0 xxx.100.173.xxx.1194 *.*
udp 241 0 10.0.0.254.1195 *.*
udp 2988 0 10.0.0.254.1195 *.*
udp 193 0 xxx.100.173.xxx.1195 *.*
udp 19591 0 xxx.100.173.xxx.1195 *.*
udp 241 0 10.0.0.254.1195 *.*
udp 20043 0 xxx.100.173.xxx.1195 *.*
udp 11878 0 xxx.100.173.xxx.1195 *.*
udp 177 0 xxx.100.173.xxx.1195 *.*
udp 193 0 xxx.100.173.xxx.1195 *.*
udp 129 0 xxx.100.173.xxx.1194 *.*
>
> So if things are building up here rather than on the interface queue,
> there ought to be a reason why it's slow to drain.
>
> Are you doing queueing?
We are now doing queueing as a of very recently, but the same symptoms
were occuring before when we had no queueing. We were at 50Mbit before
very recently as well. The extra B/W did not help.
>
> How is fragmentation being handled? In OpenVPN or relying on the kernel
> to do it? Or are you using small mtu anyway to avoid frags?
We are not tuning for fragmentation, nor are we setting mtu on
the endpoint.
>
> How does pfctl -si look?
> sudo pfctl -si
Status: Enabled for 1 days 15:58:58 Debug: err
State Table Total Rate
current entries 2694
searches 636512596 4422.1/s
inserts 2978926 20.7/s
removals 2977267 20.7/s
Counters
match 3349507 23.3/s
[snip]
everything else 0.0/s
>
> > I'm not sure how to proceed on tuning as I read tuning via sysctl is
> > becoming pointless.
>
> It's preferred if things can auto-tune without touching sysctl, but not
> everything is done that way.
>
> > net.inet.udp.sendspace=131028 # udp send buffer
>
> This may possibly need increasing though is already quite large. (while
> researching this mail it seems FreeBSD doesn't have this, does anyone here
> know what they do instead?)
We have toggled net.inet.udp.sendspace and net.inet.udp.recvspace between
131028 and 262144 with no improvements. Anything higher and we get a
hosed system...
> ifconfig
ifconfig: socket: No buffer space available
>
> > net.inet.ip.ifq.maxlen=1536
>
> Monitor net.inet.ip.ifq.drops, is there an increase?
No increases in net.inet.ip.ifq.drops through time.
> This is already a fairly large buffer though (especially as I think you
> mentioned 100Mb). How did you choose 1536?
google and trial and error.
>
> > kern.bufcachepercent=90 # kernel buffer cache memory percentage
>
> This won't help OpenVPN. Is this box also doing other things?
This box is running IPSEC
It's got four openvpn tunnels terminated on it.
We are running collectd, symon, dhcpd.
The load lives between 2 - 4.
2 users Load 3.09 3.07 2.91 Fri Jul 18 12:34:19 2014
PID USER NAME CPU 10\ 20\ 30\
40\ 50\ 60\ 70\ 80\ 90\ 100\
8941 root acpi0 83.79
#################################################
<idle> 66.65
#######################################
22 root openvpn 2.49 #
23727 root openvpn 1.37
5473 root openvpn 1.27
load averages: 3.82, 3.08, 2.77 fw0.xxx.xxx 12:00:21
86 processes: 85 idle, 1 on processor
CPU0 states: 0.0% user, 0.0% nice, 86.6% system, 8.8% interrupt, 4.6% idle
CPU1 states: 0.4% user, 0.0% nice, 7.8% system, 0.2% interrupt, 91.6% idle
CPU2 states: 0.0% user, 0.0% nice, 5.2% system, 0.2% interrupt, 94.6% idle
CPU3 states: 0.6% user, 0.0% nice, 5.0% system, 0.0% interrupt, 94.4% idle
Memory: Real: 196M/3236M act/tot Free: 28G Cache: 2752M Swap: 0K/4095M
PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND
23727 root 2 0 19M 13M sleep/3 poll 858:57 4.15% openvpn
22 root 2 0 15M 10M sleep/2 poll 9:36 2.05% openvpn
31796 root 2 0 4364K 5024K sleep/1 poll 3:17 1.51% openvpn
14650 dkw 2 0 56M 46M sleep/1 kqread 18:08 0.78% tmux
10633 _iftop 2 0 6748K 7032K sleep/2 poll 12:38 0.20% iftop
11174 dkw 10 0 340K 112K sleep/3 nanosle 0:00 0.05% sleep
13647 named 2 0 43M 44M sleep/1 select 35:07 0.00% named
5971 root 2 0 956K 2852K sleep/3 select 8:39 0.00% symux
5473 root 2 0 3824K 4576K sleep/1 poll 5:25 0.00% openvpn
19387 root 2 0 2736K 4200K sleep/2 poll 5:21 0.00% openvpn
24267 _collect 10 0 3724K 2356K sleep/2 nanosle 3:28 0.00% collectd
31304 dkw 18 0 1316K 2984K sleep/1 pause 2:53 0.00% zsh
12919 _pflogd 4 0 872K 484K sleep/3 bpf 2:01 0.00% pflogd
18414 _symon 10 0 672K 1224K sleep/1 nanosle 1:17 0.00% symon
18218 root 2 0 12M 4516K sleep/2 select 1:00 0.00% dhcpd
13258 dkw 2 0 3808K 2768K sleep/2 select 0:55 0.00% sshd
>
> How does kern.netlivelocks look? (monitor it over time rather than just
> looking at the total).
> sysctl -a |grep kern.netlivelocks
>
kern.netlivelocks=1
It has not been changed through time
-dkw
>
> > OpenBSD 5.5 (GENERIC.MP) #315: Wed Mar 5 09:37:46 MST 2014
> > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > real mem = 34301018112 (32712MB)
> > avail mem = 33379323904 (31833MB)
>
> thanks for including the full dmesg.