On Fri, Jul 18, 2014 at 09:36:23AM +0000, Stuart Henderson wrote: > On 2014-07-17, Darryl Wisneski <s...@commwebworks.com> wrote: > > netstat -s -p udp |grep "dropped due to full socket" > > 345197 dropped due to full socket buffers > > We're assuming this relates to openvpn packets but you can check which > sockets have queues: (the sendq/recvq counters). > > netstat -an | grep -v ' 0 0'
Sometimes now, sockets will display a queue and netstat -s will show a counter increase "dropped due to full socket buffers." We usually have a SIP drop then. The two are not always coordinated, but mostly are. udp 8846 0 xxx.100.173.xxx.1195 *.* udp 1448 0 xxx.100.173.xxx.1195 *.* udp 129 0 xxx.100.173.xxx.1194 *.* udp 177 0 xxx.100.173.xxx.1195 *.* udp 354 0 xxx.100.173.xxx.1195 *.* udp 21115 0 xxx.100.173.xxx.1194 *.* udp 241 0 10.0.0.254.1195 *.* udp 2988 0 10.0.0.254.1195 *.* udp 193 0 xxx.100.173.xxx.1195 *.* udp 19591 0 xxx.100.173.xxx.1195 *.* udp 241 0 10.0.0.254.1195 *.* udp 20043 0 xxx.100.173.xxx.1195 *.* udp 11878 0 xxx.100.173.xxx.1195 *.* udp 177 0 xxx.100.173.xxx.1195 *.* udp 193 0 xxx.100.173.xxx.1195 *.* udp 129 0 xxx.100.173.xxx.1194 *.* > > So if things are building up here rather than on the interface queue, > there ought to be a reason why it's slow to drain. > > Are you doing queueing? We are now doing queueing as a of very recently, but the same symptoms were occuring before when we had no queueing. We were at 50Mbit before very recently as well. The extra B/W did not help. > > How is fragmentation being handled? In OpenVPN or relying on the kernel > to do it? Or are you using small mtu anyway to avoid frags? We are not tuning for fragmentation, nor are we setting mtu on the endpoint. > > How does pfctl -si look? > sudo pfctl -si Status: Enabled for 1 days 15:58:58 Debug: err State Table Total Rate current entries 2694 searches 636512596 4422.1/s inserts 2978926 20.7/s removals 2977267 20.7/s Counters match 3349507 23.3/s [snip] everything else 0.0/s > > > I'm not sure how to proceed on tuning as I read tuning via sysctl is > > becoming pointless. > > It's preferred if things can auto-tune without touching sysctl, but not > everything is done that way. > > > net.inet.udp.sendspace=131028 # udp send buffer > > This may possibly need increasing though is already quite large. (while > researching this mail it seems FreeBSD doesn't have this, does anyone here > know what they do instead?) We have toggled net.inet.udp.sendspace and net.inet.udp.recvspace between 131028 and 262144 with no improvements. Anything higher and we get a hosed system... > ifconfig ifconfig: socket: No buffer space available > > > net.inet.ip.ifq.maxlen=1536 > > Monitor net.inet.ip.ifq.drops, is there an increase? No increases in net.inet.ip.ifq.drops through time. > This is already a fairly large buffer though (especially as I think you > mentioned 100Mb). How did you choose 1536? google and trial and error. > > > kern.bufcachepercent=90 # kernel buffer cache memory percentage > > This won't help OpenVPN. Is this box also doing other things? This box is running IPSEC It's got four openvpn tunnels terminated on it. We are running collectd, symon, dhcpd. The load lives between 2 - 4. 2 users Load 3.09 3.07 2.91 Fri Jul 18 12:34:19 2014 PID USER NAME CPU 10\ 20\ 30\ 40\ 50\ 60\ 70\ 80\ 90\ 100\ 8941 root acpi0 83.79 ################################################# <idle> 66.65 ####################################### 22 root openvpn 2.49 # 23727 root openvpn 1.37 5473 root openvpn 1.27 load averages: 3.82, 3.08, 2.77 fw0.xxx.xxx 12:00:21 86 processes: 85 idle, 1 on processor CPU0 states: 0.0% user, 0.0% nice, 86.6% system, 8.8% interrupt, 4.6% idle CPU1 states: 0.4% user, 0.0% nice, 7.8% system, 0.2% interrupt, 91.6% idle CPU2 states: 0.0% user, 0.0% nice, 5.2% system, 0.2% interrupt, 94.6% idle CPU3 states: 0.6% user, 0.0% nice, 5.0% system, 0.0% interrupt, 94.4% idle Memory: Real: 196M/3236M act/tot Free: 28G Cache: 2752M Swap: 0K/4095M PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND 23727 root 2 0 19M 13M sleep/3 poll 858:57 4.15% openvpn 22 root 2 0 15M 10M sleep/2 poll 9:36 2.05% openvpn 31796 root 2 0 4364K 5024K sleep/1 poll 3:17 1.51% openvpn 14650 dkw 2 0 56M 46M sleep/1 kqread 18:08 0.78% tmux 10633 _iftop 2 0 6748K 7032K sleep/2 poll 12:38 0.20% iftop 11174 dkw 10 0 340K 112K sleep/3 nanosle 0:00 0.05% sleep 13647 named 2 0 43M 44M sleep/1 select 35:07 0.00% named 5971 root 2 0 956K 2852K sleep/3 select 8:39 0.00% symux 5473 root 2 0 3824K 4576K sleep/1 poll 5:25 0.00% openvpn 19387 root 2 0 2736K 4200K sleep/2 poll 5:21 0.00% openvpn 24267 _collect 10 0 3724K 2356K sleep/2 nanosle 3:28 0.00% collectd 31304 dkw 18 0 1316K 2984K sleep/1 pause 2:53 0.00% zsh 12919 _pflogd 4 0 872K 484K sleep/3 bpf 2:01 0.00% pflogd 18414 _symon 10 0 672K 1224K sleep/1 nanosle 1:17 0.00% symon 18218 root 2 0 12M 4516K sleep/2 select 1:00 0.00% dhcpd 13258 dkw 2 0 3808K 2768K sleep/2 select 0:55 0.00% sshd > > How does kern.netlivelocks look? (monitor it over time rather than just > looking at the total). > sysctl -a |grep kern.netlivelocks > kern.netlivelocks=1 It has not been changed through time -dkw > > > OpenBSD 5.5 (GENERIC.MP) #315: Wed Mar 5 09:37:46 MST 2014 > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > real mem = 34301018112 (32712MB) > > avail mem = 33379323904 (31833MB) > > thanks for including the full dmesg.