David Black wrote: > Yes, I have several IPVS pairs running, some on vanilla 2.6.17.7, others > on 2.6.16-xen (Xen 3.0), 2.6.18-xen (Xen 3.1) and 2.6.9-57.EL (CentOS > 4). All show the same +1.0 load av behavior when > ipvs-syncmaster/-syncbackup are running.
We're using 2.6.18-xen and the softlockup_tick errors are only occurring when running lvs on dom0. LVS works fine on a host domain (domU). Just be forewarned. The stack trace always starts with the following few lines before diverging: kernel: <IRQ> [<ffffffff80258269>] softlockup_tick+0xcc/0xde kernel: [<ffffffff8020e84d>] timer_interrupt+0x3a3/0x401 kernel: [<ffffffff80258898>] handle_IRQ_event+0x4b/0x93 kernel: [<ffffffff8025897e>] __do_IRQ+0x9e/0x100 kernel: [<ffffffff8020cc97>] do_IRQ+0x63/0x71 kernel: [<ffffffff8034b347>] evtchn_do_upcall+0xee/0x165 kernel: [<ffffffff8020abca>] do_hypervisor_callback+0x1e/0x2c ... > > Since you mention it, I did have problems with heartbeat on Xen - no > network lockups but just heartbeat being fussy about timing, and decided > to try keepalived (VRRP)/IPVS, which solved at least the timing issues. > No kernel issues as you describe with piranha either. Potentially on topic, we've seen problems with ntp running on the domU domains, too. The dom0 will have the correct time, but the domUs drift and won't come back. 'tis strange, and I haven't found a solution for this, yet. IIRC heartbeat from linux-ha.org sends a timestamp which can cause havoc if the 2 ha servers are out of sync, time-wise. I haven't seen this issue with the heartbeat that is used in piranha's pulse - maybe it's not so picky wrt timestamps - it's happy as long as it received a ping within the last 6 seconds. Maybe keepalived isn't so picky, either. Cheers, Dan > > Dave > > Dan Yocum wrote: >> Hi Dave, >> >> >> Hopefully you don't have ipvs or lvs running on your dom0? Before I >> knew any better I put the LVS directors on 2 dom0s and ended up with >> lots of softlockup_tick kernel "panics" which would invariably bring the >> network to a screeching halt on domUs for several seconds - long enough >> for nanny (I'm using piranha) to mark a server as offline. >> >> Moving the LVS directors to their own xen VM solved these kernel lockups >> and network problems. >> >> I'm wondering if your first point may have something to do with this >> problem. >> >> Cheers, >> Dan >> >> >> >> > > > _______________________________________________ > LinuxVirtualServer.org mailing list - [email protected] > Send requests to [EMAIL PROTECTED] > or go to http://lists.graemef.net/mailman/listinfo/lvs-users -- Dan Yocum Fermilab 630.840.6509 [EMAIL PROTECTED], http://fermigrid.fnal.gov Fermilab. Just zeros and ones. _______________________________________________ LinuxVirtualServer.org mailing list - [email protected] Send requests to [EMAIL PROTECTED] or go to http://lists.graemef.net/mailman/listinfo/lvs-users
