David Black wrote:
> Yes, I have several IPVS pairs running, some on vanilla 2.6.17.7, others
> on 2.6.16-xen (Xen 3.0), 2.6.18-xen (Xen 3.1) and 2.6.9-57.EL (CentOS
> 4).  All show the same +1.0 load av behavior when
> ipvs-syncmaster/-syncbackup are running.

We're using 2.6.18-xen and the softlockup_tick errors are only occurring 
when running lvs on dom0.  LVS works fine on a host domain (domU).  Just 
be forewarned.

The stack trace always starts with the following few lines before diverging:

kernel: <IRQ> [<ffffffff80258269>] softlockup_tick+0xcc/0xde
kernel: [<ffffffff8020e84d>] timer_interrupt+0x3a3/0x401
kernel: [<ffffffff80258898>] handle_IRQ_event+0x4b/0x93
kernel: [<ffffffff8025897e>] __do_IRQ+0x9e/0x100
kernel: [<ffffffff8020cc97>] do_IRQ+0x63/0x71
kernel: [<ffffffff8034b347>] evtchn_do_upcall+0xee/0x165
kernel: [<ffffffff8020abca>] do_hypervisor_callback+0x1e/0x2c
...


> 
> Since you mention it, I did have problems with heartbeat on Xen - no
> network lockups but just heartbeat being fussy about timing, and decided
> to try keepalived (VRRP)/IPVS, which solved at least the timing issues. 
> No kernel issues as you describe with piranha either.

Potentially on topic, we've seen problems with ntp running on the domU 
domains, too.  The dom0 will have the correct time, but the domUs drift 
and won't come back.  'tis strange, and I haven't found a solution for 
this, yet.

IIRC heartbeat from linux-ha.org sends a timestamp which can cause havoc 
if the 2 ha servers are out of sync, time-wise.  I haven't seen this 
issue with the heartbeat that is used in piranha's pulse - maybe it's 
not so picky wrt timestamps - it's happy as long as it received a ping 
within the last 6 seconds.  Maybe keepalived isn't so picky, either.

Cheers,
Dan

> 
> Dave
> 
> Dan Yocum wrote:
>> Hi Dave,
>>
>>
>> Hopefully you don't have ipvs or lvs running on your dom0?  Before I 
>> knew any better I put the LVS directors on 2 dom0s and ended up with 
>> lots of softlockup_tick kernel "panics" which would invariably bring the 
>> network to a screeching halt on domUs for several seconds - long enough 
>> for nanny (I'm using piranha) to mark a server as offline.
>>
>> Moving the LVS directors to their own xen VM solved these kernel lockups 
>> and network problems.
>>
>> I'm wondering if your first point may have something to do with this 
>> problem.
>>
>> Cheers,
>> Dan
>>
>>
>>
>>   
> 
> 
> _______________________________________________
> LinuxVirtualServer.org mailing list - [email protected]
> Send requests to [EMAIL PROTECTED]
> or go to http://lists.graemef.net/mailman/listinfo/lvs-users

-- 
Dan Yocum
Fermilab  630.840.6509
[EMAIL PROTECTED], http://fermigrid.fnal.gov
Fermilab.  Just zeros and ones.

_______________________________________________
LinuxVirtualServer.org mailing list - [email protected]
Send requests to [EMAIL PROTECTED]
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

Reply via email to