Le 27/05/2009 15:38, Stuart Henderson a icrit : > Simon Morvan<[email protected]> wrote: > >> After a couple of hours/days one of the box stop functioning properly : >> no ping, no more SSH access but I still capture CARP avertisement on the >> network segments (when it occurs on the master). As a result, when it >> happens on the master, the slave does not take over. >> > > A few ideas... > > Do you have any different hardware you can try instead to rule out > some incompatibility with the machines? Have you checked for BIOS updates > etc that might help? > > Can you break into DDB when this happens? (You'll need to set ddb.console=1 > in sysctl.conf and reboot if it's not already set). If you can, trace/ps might > be useful. If not it's a useful data point. (make sure you can trigger it > correctly while the system is running normally; ctrl+alt+esc on glass console, > or BREAK on serial console; then you can 'c'ontinue). > > For what is worth, I haven't got any problems in 5 days since I switched em0 and re0 roles. I can't tell if it's related to the NICs themselves. I wish I could make any further tests, but this is a production platform... If I manage to get that type of hardware again, or a comfortable maintenance window, I'll run a new stress test and let you know.
-- Simon.

