Hi Andy,
sorry for the delay, but a lot of more important work were between your
mail and this answer ;).
You can set a simple prio on a rule like;
pass proto tcp from $left to $right set prio (1,4)
With PRIQ I mean the scheduler priq instead of cbq.
Relevant lines of my current pf.conf rule set.
<pf.conf>
...
altq on em0 priq bandwidth 1000Mb queue { std_em0, tcp_ack_em0 }
queue std_em0 priq(default)
queue tcp_ack_em0 priority 6
altq on em1 priq bandwidth 1000Mb queue { std_em1, tcp_ack_em1 }
queue std_em1 priq(default)
queue tcp_ack_em1 priority 6
match em0 on em0 inet proto tcp from any to any queue(std_em0, tcp_ack_em0)
match em0 on em1 inet proto tcp from any to any queue(std_em1, tcp_ack_em1)
...
</pf.conf>
I have read The Book of PF 2nd, but there is nothing about
troubleshooting. What should I do to find the problem?
I have made some notes for troubleshooting purpose:
top -> Interrupts -> High CPU or network interfaces => Hardware limit
systat -> Interrupts on CPU and network cards => Hardware limit
bwm-ng -> Bandwidth near the theoretical limit => Hardware limit
pfctl -si -> Look for current states, default limit to 10000. The memory
counter shows failed allocation of memory for states. Is this number is
high and increased further => Set limit for states (pfctl -sm -> shows States Limit)
sysctl kern.netlivelocks -> High number means something like two processes blocks
each user => Hardware limit
No problem can be found with above steps:
- prioritize TCP-ACK for tcp traffic
Best Regards,
Patrick
On Thu, 9 Oct 2014, Andy wrote:
Hi,
Just so I understand what you have done, PRIQ is not the same as queuing.
You can set a simple prio on a rule like;
pass proto tcp from $left to $right set prio (1,4)
But this doesn't manage the situations where you have lots of different
types/profiles of traffic on your network.
For example you might have some big file transfers going on which can be
delayed and can have a high latency but high throughput, alongside your
control/real-time protocols which need low latency etc.
Generally in this situation just using prio won't always be enough and your
file transfers will still swamp your Interactive SSH or VNC connections etc..
So we do something like this;
altq on $if_trunk1 bandwidth 4294Mb hfsc queue { _wan }
oldqueue _wan on $if_trunk1 bandwidth 4290Mb priority 15 hfsc(linkshare
4290Mb, upperlimit 4290Mb) { _wan_rt, _wan_int, _wan_pri, _wan_vpn, _wan_web,
_wan_dflt, _wan_bulk }
oldqueue _wan_rt on $if_trunk1 bandwidth 20% priority 7 qlimit 50
hfsc(realtime(20%, 5000, 10%), linkshare 20%)
oldqueue _wan_int on $if_trunk1 bandwidth 10% priority 5 qlimit 100
hfsc(realtime 5%, linkshare 10%)
oldqueue _wan_pri on $if_trunk1 bandwidth 10% priority 4 qlimit 100
hfsc(realtime(15%, 2000, 5%), linkshare 10%)
oldqueue _wan_vpn on $if_trunk1 bandwidth 30% priority 3 qlimit 300
hfsc(realtime(15%, 2000, 5%), linkshare 30%)
oldqueue _wan_web on $if_trunk1 bandwidth 10% priority 2 qlimit 300
hfsc(realtime(10%, 3000, 5%), linkshare 10%)
oldqueue _wan_dflt on $if_trunk1 bandwidth 15% priority 1 qlimit 100
hfsc(realtime(10%, 5000, 5%), linkshare 15%, ecn, default)
oldqueue _wan_bulk on $if_trunk1 bandwidth 5% priority 0 qlimit 100
hfsc(linkshare 5%, upperlimit 30%, ecn, red)
altq on $if_trunk2 bandwidth 4294Mb hfsc queue { _wan }
oldqueue _wan on $if_trunk2 bandwidth 4290Mb priority 15 hfsc(linkshare
4290Mb, upperlimit 4290Mb) { _wan_rt, _wan_int, _wan_pri, _wan_vpn, _wan_web,
_wan_dflt, _wan_bulk }
oldqueue _wan_rt on $if_trunk2 bandwidth 20% priority 7 qlimit 50
hfsc(realtime(20%, 5000, 10%), linkshare 20%)
oldqueue _wan_int on $if_trunk2 bandwidth 10% priority 5 qlimit 100
hfsc(realtime 5%, linkshare 10%)
oldqueue _wan_pri on $if_trunk2 bandwidth 10% priority 4 qlimit 100
hfsc(realtime(15%, 2000, 5%), linkshare 10%)
oldqueue _wan_vpn on $if_trunk2 bandwidth 30% priority 3 qlimit 300
hfsc(realtime(15%, 2000, 5%), linkshare 30%)
oldqueue _wan_web on $if_trunk2 bandwidth 10% priority 2 qlimit 300
hfsc(realtime(10%, 3000, 5%), linkshare 10%)
oldqueue _wan_dflt on $if_trunk2 bandwidth 15% priority 1 qlimit 100
hfsc(realtime(10%, 5000, 5%), linkshare 15%, ecn, default)
oldqueue _wan_bulk on $if_trunk2 bandwidth 5% priority 0 qlimit 100
hfsc(linkshare 5%, upperlimit 30%, ecn, red)
pass quick proto { tcp, udp } from { (vlan1:network) } to { (vlan234:network)
} port { 4569, 5060, 10000:20000 } queue _wan_rt set prio 7
pass quick proto { tcp, udp } from { (vlan1:network) } to { (vlan234:network)
} port { 53, 123, 5900 } queue _wan_pri set prio 4
pass quick proto { tcp } from { (vlan1:network) } to { (vlan234:network) }
port { 80, 443 } queue (_wan_web,_wan_pri) set prio (2,4)
pass quick proto { tcp } from { (vlan1:network) } to { (vlan234:network) }
port { ssh } queue (_wan_bulk,_wan_int) set prio (0,5)
.
. All the other rules needing higher priority than the rest
.
pass quick proto { tcp, udp, icmp } from { (vlan1:network) } to {
(vlan234:network) } queue (_wan_bulk,_wan_pri) set prio (0,4)
NB; This is the old syntax for queues and I strongly recommend reading the
3rd edition of "The book of PF" (A must read for *anyone* new or old to
OpenBSD and PF) :) and using the new syntax
The rule I use is that whenever one queue starts to get used too much and
their is more than one type of traffic in a queue (here in this example I
have DNS, NTP and VNC in the same queue) and if they start to affect
eachother, its time to split the traffic out into further separate queues. So
here you would split VNC into its own queue to stop VNC swamping the DNS
queries :)
The priority in these queues is not the same as PRIO. These "priority" values
don't have much impact *apparently* compared the the queues themselves (I
just understand these to be CPU or bucket scheduling or something), but I've
never understood how true that is, so I just set them to be the same number
as the desired relative PRIO as that seems sensible.
Last but NOT least; the PRIO value gets copied into the VLAN's CoS header! :)
So if you use VLANs like we do here on our trunks, the different packets will
end up as frames with the prio copied in meaning your switches can then also
maintain the layer 3 QoS in the layer 2 CoS... Amazing stuff :)
Good luck
Andrew Lemin
*** looking forward to 64bit queues! :) ***
On 08/10/14 20:49, jum...@yahoo.de wrote:
Hi Andy,
This morning I have added Priority Queueing (PRIQ) to the ruleset and
prefer TCP ACK packets over everything else. I can see the queues with
systat queue but the change has no effect on the user experience nor the
throughput.
I have read something about adjust TCP send and receive window sizes
settings, but OpenBSD to this automatically since 2010 [1]. What else can I
set?
Best Regards,
Patrick
[1] http://marc.info/?l=openbsd-misc&m=128905075911814
On Thu, 2 Oct 2014, jum...@yahoo.de wrote:
Hi Andy,
Setup some queues and prioritise your ACK's ;)
Good idea, I will try to implement a Priority Queueing with the old altq.
Best Regards,
Patrick
On Thu, 2 Oct 2014, Andy wrote:
Setup some queues and prioritise your ACK's ;)
The box is fine under the load I'm sure, but you'll still need to
prioritise those TCP acknowledgments to make things snappy when lots of
traffic is going on..
On 02/10/14 17:13, Ville Valkonen wrote:
Hello Patrick,
On 2 October 2014 17:32, Patrick <jum...@yahoo.de> wrote:
Hi,
I use a OpenBSD based firewall (version 5.2, I know I should upgrade
but ...) between a 8 host cluster of Linux server and 300 clients which
will access this clutser via VNC. Each server is connected with one
gigabit port to a dedicated switch and the firewall has on each site
one gigabit (dedicated switch and campus network).
The users complains about slow VNC response times (if I connect a
client system to the dedicated switch, the access is faster, even
during peak hours), and the admins of the cluster blame my firewall :(.
I use MRTG for traffic monitoring (data retrieves from OpenBSD in one
minute interval) and can see average traffic of 160 Mbit/s during
office hours and peaks and 280 Mbit/s. With bwm-ng and a five second
interval I can see peaks and 580 Mbit/s. The peak packets per second is
arround 80000 packets (also measured with bwm-ng). The interrupt of
CPU0 is in peak 25%. So with this data I don't think the firewall is at
the limit, I'm right?
The server is a standard Intel Xeon (E3-1220V2, 4 Cores, 3.10 GHz) with
4 GByte of memory and 4 1 Gbit/s ethernet cooper Intel nics (driver
em).
Where is the problem? Can't the nics handle more packets/second? How
can I check for this?
If I connect a client system directly to the dedicated system, the
response times are better.
Thanks for your help,
Patrick
In addition to dmesg, could you please provide the following
information:
$ pfctl -si
$ sysctl kern.netlivelocks
and interrupt statistics (by systat for example) would be helpful.
Thanks!
--
Regards,
Ville