On Fri, Jul 18, 2014 at 09:36:23AM +0000, Stuart Henderson wrote:
> On 2014-07-17, Darryl Wisneski <s...@commwebworks.com> wrote:
> > netstat -s -p udp |grep "dropped due to full socket" 
> > 345197 dropped due to full socket buffers
> 
> We're assuming this relates to openvpn packets but you can check which
> sockets have queues: (the sendq/recvq counters).
> 
> netstat -an | grep -v ' 0      0'

Sometimes now, sockets will display a queue and netstat -s will show a counter
increase "dropped due to full socket buffers."  We usually have a SIP
drop then.  The two are not always coordinated, but mostly are.

udp       8846      0  xxx.100.173.xxx.1195    *.*                   
udp       1448      0  xxx.100.173.xxx.1195    *.*                   
udp        129      0  xxx.100.173.xxx.1194    *.*                   
udp        177      0  xxx.100.173.xxx.1195    *.*                   
udp        354      0  xxx.100.173.xxx.1195    *.*                   
udp      21115      0  xxx.100.173.xxx.1194    *.*                   
udp        241      0  10.0.0.254.1195       *.*                   
udp       2988      0  10.0.0.254.1195       *.*                   
udp        193      0  xxx.100.173.xxx.1195    *.*                   
udp      19591      0  xxx.100.173.xxx.1195    *.*                   
udp        241      0  10.0.0.254.1195       *.*                   
udp      20043      0  xxx.100.173.xxx.1195    *.*                   
udp      11878      0  xxx.100.173.xxx.1195    *.*                   
udp        177      0  xxx.100.173.xxx.1195    *.*                   
udp        193      0  xxx.100.173.xxx.1195    *.*                   
udp        129      0  xxx.100.173.xxx.1194    *.*  

> 
> So if things are building up here rather than on the interface queue,
> there ought to be a reason why it's slow to drain.
> 
> Are you doing queueing?

We are now doing queueing as a of very recently, but the same symptoms
were occuring before when we had no queueing.  We were at 50Mbit before
very recently as well.  The extra B/W did not help.

> 
> How is fragmentation being handled? In OpenVPN or relying on the kernel
> to do it? Or are you using small mtu anyway to avoid frags?

We are not tuning for fragmentation, nor are we setting mtu on
the endpoint.

> 
> How does pfctl -si look?

> sudo  pfctl -si
Status: Enabled for 1 days 15:58:58              Debug: err

State Table                          Total             Rate
  current entries                     2694               
  searches                       636512596         4422.1/s
  inserts                          2978926           20.7/s
  removals                         2977267           20.7/s
Counters
  match                            3349507           23.3/s

[snip]

everything else 0.0/s

> 
> > I'm not sure how to proceed on tuning as I read tuning via sysctl is
> > becoming pointless.
> 
> It's preferred if things can auto-tune without touching sysctl, but not
> everything is done that way.
> 
> > net.inet.udp.sendspace=131028   # udp send buffer
> 
> This may possibly need increasing though is already quite large. (while
> researching this mail it seems FreeBSD doesn't have this, does anyone here
> know what they do instead?)

We have toggled net.inet.udp.sendspace and net.inet.udp.recvspace between
131028 and 262144 with no improvements.  Anything higher and we get a
hosed system...

> ifconfig 
ifconfig: socket: No buffer space available

> 
> > net.inet.ip.ifq.maxlen=1536
> 
> Monitor net.inet.ip.ifq.drops, is there an increase?

No increases in net.inet.ip.ifq.drops through time.

> This is already a fairly large buffer though (especially as I think you
> mentioned 100Mb). How did you choose 1536?

google and trial and error.

> 
> > kern.bufcachepercent=90         # kernel buffer cache memory percentage
> 
> This won't help OpenVPN. Is this box also doing other things?

This box is running IPSEC

It's got four openvpn tunnels terminated on it.

We are running collectd, symon, dhcpd.  

The load lives between 2 - 4.

    2 users    Load 3.09 3.07 2.91                     Fri Jul 18 12:34:19 2014

     PID USER             NAME                          CPU    10\   20\   30\  
 40\   50\   60\   70\   80\   90\  100\
    8941 root             acpi0                       83.79 
#################################################
                          <idle>                      66.65 
#######################################
      22 root             openvpn                      2.49 #
   23727 root             openvpn                      1.37
    5473 root             openvpn                      1.27


load averages:  3.82,  3.08,  2.77                    fw0.xxx.xxx 12:00:21
86 processes: 85 idle, 1 on processor
CPU0 states:  0.0% user,  0.0% nice, 86.6% system,  8.8% interrupt,  4.6% idle
CPU1 states:  0.4% user,  0.0% nice,  7.8% system,  0.2% interrupt, 91.6% idle
CPU2 states:  0.0% user,  0.0% nice,  5.2% system,  0.2% interrupt, 94.6% idle
CPU3 states:  0.6% user,  0.0% nice,  5.0% system,  0.0% interrupt, 94.4% idle
Memory: Real: 196M/3236M act/tot Free: 28G Cache: 2752M Swap: 0K/4095M

  PID USERNAME PRI NICE  SIZE   RES STATE     WAIT      TIME    CPU COMMAND
23727 root       2    0   19M   13M sleep/3   poll    858:57  4.15% openvpn
   22 root       2    0   15M   10M sleep/2   poll      9:36  2.05% openvpn
31796 root       2    0 4364K 5024K sleep/1   poll      3:17  1.51% openvpn
14650 dkw        2    0   56M   46M sleep/1   kqread   18:08  0.78% tmux
10633 _iftop     2    0 6748K 7032K sleep/2   poll     12:38  0.20% iftop
11174 dkw       10    0  340K  112K sleep/3   nanosle   0:00  0.05% sleep
13647 named      2    0   43M   44M sleep/1   select   35:07  0.00% named
 5971 root       2    0  956K 2852K sleep/3   select    8:39  0.00% symux
 5473 root       2    0 3824K 4576K sleep/1   poll      5:25  0.00% openvpn
19387 root       2    0 2736K 4200K sleep/2   poll      5:21  0.00% openvpn
24267 _collect  10    0 3724K 2356K sleep/2   nanosle   3:28  0.00% collectd
31304 dkw       18    0 1316K 2984K sleep/1   pause     2:53  0.00% zsh
12919 _pflogd    4    0  872K  484K sleep/3   bpf       2:01  0.00% pflogd
18414 _symon    10    0  672K 1224K sleep/1   nanosle   1:17  0.00% symon
18218 root       2    0   12M 4516K sleep/2   select    1:00  0.00% dhcpd
13258 dkw        2    0 3808K 2768K sleep/2   select    0:55  0.00% sshd

> 
> How does kern.netlivelocks look? (monitor it over time rather than just
> looking at the total).

> sysctl -a |grep kern.netlivelocks                                             
>       
kern.netlivelocks=1

It has not been changed through time

-dkw

> 
> > OpenBSD 5.5 (GENERIC.MP) #315: Wed Mar  5 09:37:46 MST 2014
> >     dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > real mem = 34301018112 (32712MB)
> > avail mem = 33379323904 (31833MB)
> 
> thanks for including the full dmesg.

Reply via email to