[lvs-users] Problem: Occasionally all traffic gest sent to a single backend node

kotobuki intl Fri, 18 Jan 2008 01:36:57 -0800

We have a number of LVS clusters setup and have been running them
successfully for a few years, however occasionally we notice that the
load balancing plays up and the traffic becomes unbalanced.  What
happens is all requests start getting sent to a single backend server
and the other backend servers get no traffic.


It doesn't matter what scheduler is used (ie: rr, wrr, lc, wlc) we
still ocassionally see this pattern.  Also its not the monitoring
script taking the backed nodes off the ipvs table.  Our LVS directors
run CentOS 4.6  and OpenSUSE 10.1 (unrelated clusters) but we see it
happen on both.

The following URL [
http://img208.imageshack.us/my.php?image=lvsunbalancedsf2.gif ], has a
picture of the backend servers monitoring and you can see at about
2:00 traffic from three of the backend servers drops off to nothing
and "server 4" starts handling all traffic.  Whilst this occurred
ipvsadm gave the following output;

$ ipvsadm -ln
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  xxx.xxx.xxx.102:80 rr
  -> xxx.xxx.xxx.9:80              Route   100    0          0
  -> xxx.xxx.xxx.10:80             Route   100    0          0
  -> xxx.xxx.xxx.11:80             Route   100    0          0
  -> xxx.xxx.xxx.12:80             Route   100    0          0
TCP  xxx.xxx.xxx.103:80 rr
  -> xxx.xxx.xxx.12:80             Route   100    0          0
  -> xxx.xxx.xxx.11:80             Route   100    0          0
  -> xxx.xxx.xxx.10:80             Route   100    0          0
  -> xxx.xxx.xxx.9:80              Route   100    0          0
TCP  xxx.xxx.xxx.100:80 rr
  -> xxx.xxx.xxx.9:80              Route   100    0          0
  -> xxx.xxx.xxx.10:80             Route   100    0          0
  -> xxx.xxx.xxx.11:80             Route   100    0          0
  -> xxx.xxx.xxx.12:80             Route   100    0          0
TCP  xxx.xxx.xxx.101:80 rr
  -> xxx.xxx.xxx.12:80             Route   100    1          1
  -> xxx.xxx.xxx.11:80             Route   100    6          0
  -> xxx.xxx.xxx.10:80             Route   100    1          3
  -> xxx.xxx.xxx.9:80              Route   100    1          0
TCP  xxx.xxx.xxx.104:80 rr
  -> xxx.xxx.xxx.9:80              Route   100    0          0
  -> xxx.xxx.xxx.10:80             Route   100    0          0
  -> xxx.xxx.xxx.11:80             Route   100    0          0
  -> xxx.xxx.xxx.12:80             Route   100    0          0

The connection column's have pretty much dropped to zero, even though
traffic is still being served on the site.  Note that server4 in the
graph corresponds to xxx.xxx.xxx.12.

To get around this problem we run a script which monitors the VIP and
if the HTTP requests take to long  (due to the single backend node
being overloaded) we cause heartbeat to standby and swap to the VIP's
to the other LVS director.  That then causes everything to be
rebalanced.

Anyhow does anyone else have this problem and ideally a fix?


Thanks,
Paul

_______________________________________________
LinuxVirtualServer.org mailing list - [email protected]
Send requests to [EMAIL PROTECTED]
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

[lvs-users] Problem: Occasionally all traffic gest sent to a single backend node

Reply via email to