The problem exists even if I use the system's "/usr/bin/false" and "/usr/bin/true" commands. The problem exists even when PF is disabled or the only rule is "pass in".
That being said the script itself is a simple host lookup against the IP addresses to ensure the DNS server is actually resolving. Again, just using "/usr/bin/false" or "/usr/bin/true" produces the same drop in throughput. An example of the drop looks like this: 9913Mbps 9913Mbps 7253Mbps <--- script interval point 9913Mbps 9913Mbps ...etc... When the script is an actual shell script rather than /usr/bin/false, the throughput drops spans the three seconds surrounding the time the script runs. 9913Mbps 9913Mbps 4321Mbps 7253Mbps <--- script interval point 5162Mbps 9913Mbps 9913Mbps # relayd.conf (somewhat sterilized): table <dns-servers> { 192.168.1.1, 192.168.1.2 } redirect dns-udp { listen on 192.168.100.1 udp port 53 forward to <dns-servers> port 53 \ check script "/usr/bin/false" \ timeout 4000 \ interval 15 \ mode roundrobin } redirect dns-tcp { listen on 192.168.100.1 port 53 forward to <dns-servers> port 53 \ check script "/usr/bin/false" \ timeout 4000 \ interval 15 \ mode roundrobin } On Mon, Jul 30, 2012 at 8:59 AM, Gregory Edigarov <ediga...@cupid.com> wrote: > On 07/30/2012 03:25 PM, Bennett Samowich wrote: >> >> I've uncovered a troubling performance symptom that I believe is >> related to relayd's "check script" functionality. >> >> The system is a Dell R710 with 12GB RAM and 10Gb interfaces. The >> problem is that when relayd is running with redirects that uses the >> check script functionality, performance of the interface drops around >> 30% while the check script is running. >> >> I ran the tests in an offline configuration so no other traffic could >> be a factor ( test1 <--> OpenBSD <--> test2 ). Tests were performed >> using the nuttcp tool and both servers ( test1 & test2 ) pull >> line-rate 9.912Gbps when connected back-to-back. When run through the >> OpenBSD firewall, regardless of PF rules, the rate drops to 7.25Gbps >> when the script runs. >> >> At first I thought it was my script but I replaced my script with >> 'true', 'false' and the problem still remained. I've validated that >> this exists in versions 4.8 through 5.1. I've also tried looking at >> the relayd code but it seemed like a reasonable exec call. I can't >> seem to understand why a running script would cause a network >> performance drop. I would also bet that this only noticeable over >> 10Gb interfaces. Nevertheless, with check script running every 15 >> seconds we've succumbed to an overall drop in network performance. > > Sorry, you do not give a full information. What's in your script? what's in > your relayd.conf? > what are your pf rules? dmesg is also welcome. > >> Any insight or direction would be greatly appreciated. >> >> Bennett >> > --