The problem exists even if I use the system's "/usr/bin/false" and
"/usr/bin/true" commands.
The problem exists even when PF is disabled or the only rule is "pass in".
That being said the script itself is a simple host lookup against the
IP addresses to ensure the DNS server is actually resolving. Again,
just using "/usr/bin/false" or "/usr/bin/true" produces the same drop
in throughput.
An example of the drop looks like this:
9913Mbps
9913Mbps
7253Mbps <--- script interval point
9913Mbps
9913Mbps
...etc...
When the script is an actual shell script rather than /usr/bin/false,
the throughput drops spans the three seconds surrounding the time the
script runs.
9913Mbps
9913Mbps
4321Mbps
7253Mbps <--- script interval point
5162Mbps
9913Mbps
9913Mbps
# relayd.conf (somewhat sterilized):
table <dns-servers> { 192.168.1.1, 192.168.1.2 }
redirect dns-udp {
listen on 192.168.100.1 udp port 53
forward to <dns-servers> port 53 \
check script "/usr/bin/false" \
timeout 4000 \
interval 15 \
mode roundrobin
}
redirect dns-tcp {
listen on 192.168.100.1 port 53
forward to <dns-servers> port 53 \
check script "/usr/bin/false" \
timeout 4000 \
interval 15 \
mode roundrobin
}
On Mon, Jul 30, 2012 at 8:59 AM, Gregory Edigarov <[email protected]> wrote:
> On 07/30/2012 03:25 PM, Bennett Samowich wrote:
>>
>> I've uncovered a troubling performance symptom that I believe is
>> related to relayd's "check script" functionality.
>>
>> The system is a Dell R710 with 12GB RAM and 10Gb interfaces. The
>> problem is that when relayd is running with redirects that uses the
>> check script functionality, performance of the interface drops around
>> 30% while the check script is running.
>>
>> I ran the tests in an offline configuration so no other traffic could
>> be a factor ( test1 <--> OpenBSD <--> test2 ). Tests were performed
>> using the nuttcp tool and both servers ( test1 & test2 ) pull
>> line-rate 9.912Gbps when connected back-to-back. When run through the
>> OpenBSD firewall, regardless of PF rules, the rate drops to 7.25Gbps
>> when the script runs.
>>
>> At first I thought it was my script but I replaced my script with
>> 'true', 'false' and the problem still remained. I've validated that
>> this exists in versions 4.8 through 5.1. I've also tried looking at
>> the relayd code but it seemed like a reasonable exec call. I can't
>> seem to understand why a running script would cause a network
>> performance drop. I would also bet that this only noticeable over
>> 10Gb interfaces. Nevertheless, with check script running every 15
>> seconds we've succumbed to an overall drop in network performance.
>
> Sorry, you do not give a full information. What's in your script? what's in
> your relayd.conf?
> what are your pf rules? dmesg is also welcome.
>
>> Any insight or direction would be greatly appreciated.
>>
>> Bennett
>>
> --