I have been using PF on openBSD as a firewall box without any problem. I
have two boxes in redundant configuration with CARP. Afterwards, I
needed to use load balancing for both http and https using hoststated.
However, load balancing does not seem to stable. In my case, it is
almost working for a week. Then, it starts seeing the primary web server
down, and it tries to use the backup web server. Sometimes, it fail
overs to the backup server, sometimes it sees the backup one down as
well. Although my both web servers are up and running, it never sees
them up, and load balancing just stays in down state. I tried to reload
hoststated but it did not make any difference. I also tried to stop
hoststated, but it failed to stop. I also tried to disable/enable PF,
and it did not make any difference. Only way to recover is to reboot the
boxes once hoststated is down. Then, the cycle starts again, and it goes
well for a week and the same thing again.
While troubleshooting, I only noticed that the total memory usage
reported by "top" always gets higher and higher. I have 2 G of physical
memory on the boxes. However, my observation is that when the total mem
hits 260M level, it may fail anytime.
Here is the top from a newly rebooted box:
load averages: 0.11, 0.09, 0.08
11:16:54
36 processes: 35 idle, 1 on processor
CPU0 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt,
100% idle
CPU1 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt,
100% idle
Memory: Real: 12M/97M act/tot Free: 3865M Swap: 0K/3585M used/tot
PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU
COMMAND
7 root 10 0 0K 84M sleep/0 pftm 0:00 0.00%
pfpurge
9 root -18 0 0K 84M idle reaper 0:00 0.00%
reaper
11 root 18 0 0K 84M sleep/0 syncer 0:00 0.00%
update
0 root -18 0 0K 84M sleep/0 schedul 0:00 0.00%
swapper
13 root 14 0 0K 84M idle crypto_ 0:00 0.00%
crypto
10 root -13 0 0K 84M idle cleaner 0:00 0.00%
cleaner
4 root 10 0 0K 84M idle usbevt 0:00 0.00% usb0
6 root 10 0 0K 84M idle usbevt 0:00 0.00% usb1
12 root -18 0 0K 84M idle aiodone 0:00 0.00%
aiodoned
8 root -18 0 0K 84M idle pgdaemo 0:00 0.00%
pagedaemon
5 root 10 0 0K 84M idle usbtsk 0:00 0.00%
usbtask
3 root 10 0 0K 84M idle bored 0:00 0.00% syswq
2 root -18 0 0K 84M idle kmalloc 0:00 0.00%
kmthread
640 root 2 0 2252K 4480K sleep/0 select 0:00 0.00% snmpd
24451 _hoststa 2 0 1356K 2456K idle kqread 0:12 0.00%
hoststated
The following is from a box rebooted yesterday (note the increased
memory usage of system processes under RES column):
load averages: 0.16, 0.14, 0.09
11:18:01
37 processes: 1 running, 35 idle, 1 on processor
CPU0 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt,
100% idle
CPU1 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt,
100% idle
Memory: Real: 13M/206M act/tot Free: 1802M Swap: 0K/3584M used/tot
PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU
COMMAND
8 root 10 0 0K 193M sleep/0 pftm 0:00 0.00%
pfpurge
4 root 10 0 0K 193M sleep/0 ipmi_po 0:00 0.00% ipmi0
10 root -18 0 0K 193M idle reaper 0:00 0.00%
reaper
12 root 18 0 0K 193M sleep/0 syncer 0:00 0.00%
update
0 root -18 0 0K 193M sleep/0 schedul 0:00 0.00%
swapper
14 root 14 0 0K 193M idle crypto_ 0:00 0.00%
crypto
11 root -13 0 0K 193M idle cleaner 0:00 0.00%
cleaner
5 root 10 0 0K 193M sleep/0 usbevt 0:00 0.00% usb0
7 root 10 0 0K 193M sleep/0 usbevt 0:00 0.00% usb1
13 root -18 0 0K 193M idle aiodone 0:00 0.00%
aiodoned
9 root -18 0 0K 193M idle pgdaemo 0:00 0.00%
pagedaemon
6 root 10 0 0K 193M idle usbtsk 0:00 0.00%
usbtask
3 root 10 0 0K 193M idle bored 0:00 0.00% syswq
2 root -18 0 0K 193M idle kmalloc 0:00 0.00%
kmthread
29491 root 2 0 2308K 4536K sleep/0 select 0:03 0.00% snmpd
22579 _hoststa 2 0 2280K 2832K idle kqread 3:11 0.00%
hoststated
Rami