Re: [networking-discuss] Design review for the VRRP project

James Carlson Thu, 30 Oct 2008 12:47:38 -0700

Huafeng Lu writes:
> 篋\x8E 2008綛\xB410\xE6\x9C\x8828\xE6\x97\xA5 13:10, Huafeng Lu 
> \xE5\x86\x99\xE9\x81\x93:
> > I think it is.  SMF potentially solves some interesting issues,
> > including:
> >
> >   - It allows administrators to specify what applications are the
> >     dependencies for VRRP's control of an address.  As a user, I can
> >     add a dependency in my http:apache2 instance saying that
> >     vrrp:apache2 should not come on-line unless Apache is running.  If
> >     the server fails, the system takes vrrp:apache2 off-line, and we
> >     get the expected fail-over.
> 
> Hi, Jim,
> 
> The dependency can be considered some kind of "health check" mechanism 
> when VRRP is used to protect a service. I think it's the biggest 
> advantage of factoring VRRP into SMF. When VRRP is used to handle host 
> failure (the "standard" goal of VRRP, as specified in the RFC), no 
> dependency is needed.


The RFC talks about protecting the forwarding service ... where does
it talk about host failure?

> But there's a problem. Take the http service as an example. As you 
> pointed out, vrrp:apache2 depends on http:apache2, so VRRP should run 
> after the web server is started. On the other hand, VRRP is used to 
> protect the the VRIP that is shared by the master and backup hosts. 
> Before VRRP is running, the VRIP may not appear on the host, so the web 
> server application may not be able to run. (Some application may bind to 
> INADDR_ANY, but we cannot assume how the applications are designed.)

I think that issue is intimately tied up with another problem: dealing
with the "no-receive-but-sometimes-receive" behavior on the non-owning
system for normal (routing) VRRP.

It would be nice if both systems behaved normally, meaning that the IP
address is configured on both systems and the application binds to
that address without trouble.  On the backup system, the application
just sits there idle and doesn't see requests because VRRP has
disabled that logic.

The reason I suggest doing it that way is that you end up with fewer
moving parts in the end -- fewer components are involved in the actual
fail-over process -- and fewer parts means that there's less to go
wrong.  You don't have to answer the inevitable questions about
switching, such as "what if I fail over, and then the application is
unable to bind to the address?"

Plus, it breaks a feedback loop: it looks like the health of the
service affects whether VRRP can fail over by changing its state, and
VRRP failing over forces the service to change state.  It'll be hard
to show that a system like that doesn't have obscure failure modes.

If the architecture is tied to VNICs (and there's no plan to separate
that), then that gives us a potential handle for controlling how the
system responds.  There appear to be at least two separate switches
here: allow low-level traffic in and out (for ARP, ND, RDISC), and
allow application-level traffic in and out.

Just a thought.  I think there may be other ways to implement this as
well.

> Note, in most cases, people will use a real IP address that is on the 
> master host as the VRIP, so the application may run on the master. But 
> it's also possible that the VRIP is not actually on either host.

If it's not on either host, then what does VRRP do?

> Further, the application should run normally on both hosts (whether 
> backup or master), and when the backup becomes master, the new-master 
> can take the role naturally. But the VRIP can be on only one of the 
> hosts at a given time. So even VRRP runs before the application starts, 
> it won't work. A possible solution is to register external scripts to 
> start the application when VRRP enters the MASTER state, and to stop the 
> application when it leaves. But this looks too cumbersome.
> 
> So it looks like VRRP is not suitable to protect such local services.

I think that not protecting local services means that the "high
availability" story has a pretty big hole.  It's able to protect
against the relatively unlikely event of a previously-working but
now-failed Ethernet port, but not against the more likely failure of
software.

Maybe I don't understand enough about the marketing requirements for
VRRP, but I thought that part of the point of VRRP was to protect the
service and not just the interfaces.

-- 
James Carlson, Solaris Networking              <[EMAIL PROTECTED]>
Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

_______________________________________________
networking-discuss mailing list
[email protected]

Re: [networking-discuss] Design review for the VRRP project

Reply via email to