On 28.09.2010, at 10:54, Jurgen Weber <[email protected]> wrote:

> Hello List
> 
> We have been having issues with some firewall machines of ours using pfSense.
> 
> FreeBSD smash01.ish.com.au 7.2-RELEASE-p5 FreeBSD 7.2-RELEASE-p5 #0: Sun Dec  
> 6 23:20:31 EST 2009 
> sullr...@freebsd_7.2_pfsense_1.2.3_snaps.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.7
>   i386
> 
> MotherBoard: 
> http://www.supermicro.com/products/motherboard/Xeon3000/3200/X7SBi-LN4.cfm
> 
> Originally the systems started out by showing a lot of packet loss, the 
> system time would fall behind, and the value of "#vmstat -i | grep timer" was 
> dropping below 2000. I was lead to believe by the guys at pfSense that this 
> is where the value should sit. I would also receive errors in messages that 
> looked like " kernel: calcru: runtime went backwards from 244314 usec to 
> 236341".
> 
> We tried a variety of things, disabling USB, turning off the Intel Speed Step 
> in the BIOS, disabling ACPI, etc, etc. All having little to no effect. The 
> only thing that would right it is restarting the box but over time it would 
> degrade again. I talked to the SuperMicro and they said that this is a 
> FreeBSD issue and pretty much washed their hands of it.
> 
> After a couple of months of dealing with this and just rebooting the systems 
> reguarly, the symptoms slowly but surely disappeared. eg. The kernel messages 
> went away, the system time was not falling behind and I was experiencing no 
> packet loss but the "#vmstat -i | grep timer" value would continue to 
> decrease over time. Eventually I think, when it finally got the 0 the machine 
> restarted (I am only guessing here).
> 
> After this restart it worked again for a couple of hours and then it 
> restarted again.
> 
> After the second time the system has not missed a beat, it has been fine and 
> the "#vmstat -i | grep timer" value remained near the 2000 mark... We setup 
> some zabbix monitoring to watch it. As mentioned it was fine for about a 
> month. Until today. Today the value has dropped to 0, but the system has not 
> restarted and over the last couple of hours the value has increased to 47.
> 
> This machine is mission critical, we have two in a fail over scenario (using 
> pfSense's CARP features) and it seems unfortunate that we have an issue with 
> two brand new SuperMicro boxes that affect both machines. While at the moment 
> everything seems fine I want to ensure that I have no further issues. Does 
> anyone have any suggestions?
> 
> Lastly I have double check both of the below:
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/troubleshoot.html#CALCRU-NEGATIVE-RUNTIME
> We disabled EIST.
> 
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/troubleshoot.html#COMPUTER-CLOCK-SKEW
> 
> # dmesg | grep Timecounter
> Timecounter "i8254" frequency 1193182 Hz quality 0
> Timecounters tick every 1.000 msec
> # sysctl kern.timecounter.hardware
> kern.timecounter.hardware: i8254
> 
> Only have one timer to choose from.
> 
> Thanks
> 
> Jurgen
> 
> _______________________________________________
> [email protected] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "[email protected]"


Hello,
vmsat -i calculates interrupt rate based on interrupt count/uptime, and the 
interrupt count is 32 bit integer. 
With high values of kern.hz it will overflow in few days (with kern.hz=4000 it 
will happen every 12 days or so).
If that is the case, use systat -vmstat 1 to get accurate interrupt rate.
That is just fyi, because i was confused once and it scared me abit, and i 
started changing counters untill i noticed this.

p.s. please forgive my poor 
english_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"

Reply via email to