I placed response in line. 

Thanks


Bryant Zimmerman 
----------------------------------------

From: "Jim Pingle" <[email protected]>

Sent: Friday, January 06, 2012 10:06 AM

To: "pfSense support and discussion" <[email protected]>

Subject: Re: [pfSense] Carp locking up routers.


On 1/6/2012 9:40 AM, Bryant Zimmerman wrote:

> I have had another lockup with CARP. This is on three identical

> suprermicro systems. Running 2.0-RELEASE with 4 intel nic ports in each.

> 

> What happens is the pfSense routers just stop sending or receiving

> traffic on all IP address CARP and non CARP.

> We could not access either unit from the net. I hooked in a console

> cable and the hardware was still responding but I did not have enough

> time to do any diagnostics. I had to reboot all of them to get traffic

> flowing again. And since they are nanobsd I have no saved logs after the

> reboot. This is really rendering CARP useless and I need some ideas on

> how to solve this.


No errors at all on the console?

I did not see any errors but I only had a few min to look as I had to get 
the system on line. 


Only issue I'm aware of that can lock at the moment isn't CARP related,

it's IPv6 related so only affects 2.1.

This is 2.0 Release not 2.1 and I am not using IPv6 yet


What kind of network interfaces do you have?

2 igb and 2 em  The igb are on a dual intel nic and the em are built-in 
intell. 


Any bridging, lagg, vlans, etc?

I have 11 vlans, 1 WAN, The wan port on each router has a static public 
address and there is one shared CARP address on the wan.

each of the vlans are the igb1 interface. The wan is on the igb0 interface 
and em0 is carp and em1 is a management vlan


Any indications in the RRD graphs of something leading up to the lock?

RRD graphs are set to dump to the Flash every 2 hours but after the reboot 
there was no data there at all. I had checked the routers 40 min before the 
lockup and cpu and memory and state tables looked good.  I am really 
perplexed why my RRD data is all gone. before the reset. 


Setting up a syslog server is a good suggestion in case something is 
logged.

I am working on that.


You might also try (I can't remember if we enable this and I don't have

a HW firewall with a keyboard I can break at hand) to press ctrl-alt-esc

to see if it breaks into the debugger.

I don't have keyboards on these or monitors as they are nanobsd installs 
and the console is disabled. 


If it does get you to a db prompt, you can get a crash dump like so:


textdump set; capture on; run lockinfo; show pcpu; bt; ps; alltrace;

capture off; call doadump; reset


Then once it reboots it should offer to submit the crash report to us.


You could also try switching to the debug kernel, but make sure you take

a backup first in case it doesn't work as expected:


http://doc.pfsense.org/index.php/Switching_Kernels


Jim


Jim thanks for the input. I will keep you posted. I am considering puting 
in a unit with a hard drive in it. This is the second time this has happend 
in six months now. Other wise these units have been stable. 


_______________________________________________
List mailing list
[email protected]
http://lists.pfsense.org/mailman/listinfo/list

Reply via email to