Re: [networking-discuss] Design review for the VRRP project

James Carlson Fri, 31 Oct 2008 06:42:13 -0700

Huafeng Lu writes:
> 于 2008年10月31日 02:56, James Carlson 写道:
> In most cases, "not hearing heartbeat" means the peer host failure. Of
> course it could be an interface failure or link failure, but the VRRP
> protocol is unable to distinguish them.


The point is that there's no way to tell, and that it protects against
multiple sorts of failures, not just the complete loss of the host.

> > The reason I suggest doing it that way is that you end up with fewer
> > moving parts in the end -- fewer components are involved in the actual
> > fail-over process -- and fewer parts means that there's less to go
> > wrong.  You don't have to answer the inevitable questions about
> > switching, such as "what if I fail over, and then the application is
> > unable to bind to the address?"
> 
> 
> Understood. Using some method (being discussed but not determined), the
> VRIP can be configured on both hosts, but the backup system doesn't send
> or respond to ARP, ND or RA messages, so that the outside world is not
> aware of its existence. Thus, the applications can start, but traffic
> only goes to the master host.

Yes, that's the idea.

> But there's still a problem. VRRP must run to set up the VRIP before
> applications can bind to the VRIP, so semantically the app depends on
> VRRP. When the application is running, VRRP need to depend on it to
> protect it. Thus, VRRP and the application depend on each other. Seems
> it's not easy for SMF to handle such dependencies.

Why does VRRP itself need to set up the virtual IP address?

VRRP clearly needs to control how the system responds to that address,
but couldn't it be configured through the usual means (i.e.,
/etc/hostname.*)?

> We may try:
> 
> Method 1: In http:apache2's start method, first create vrrp:apache2
> (with no dependencies) to set up the IP, then start the httpd program,
> then add the dependency to vrrp:apache2. This probably won't work, since
> all these are done in the start method, and before the start method
> exits the vrrp:apache2 service (logically) shouldn't be considered
> "online", thus if we add the dependency to vrrp:apache2, since
> http:apache2 is still not "online", logically vrrp:apache2 shouldn't
> work normally.

Changing dependencies on the fly doesn't sound like a good solution to
the problem to me.

> Method 2: does all necessary setup and cleanup in vrrpadm. In "vrrpadm
> create", first create the VRRP instance and the SMF vrrp:apache2
> instance (which depends on http:apache2). Vrrp:apache2 should be in the
> offline state since its dependency (http:apache2) is not ready, but the
> VRRP instance can stay in the INIT state so that the VRIP can be set up.

I don't follow.  If something is in 'offline' state, then it's not
running at all.  It can't stay in "INIT" state (unless "INIT" means
"process not running at all"), nor can it set up the virtual IP
address.

> Next start the http:apache2 service; this can start successfully since
> the VRIP is ready. Finally send a "startup" event to the VRRP instance
> to bring vrrp:apache2 on-line. This should probably work, but looks
> quite like a hack.

Yes, it does.  And it doesn't address what happens if http:apache2
fails.

> If a VRRP instance is not factored as a SMF service instance, things
> would be easier:
> 
> Method 3 (modification to method 1): VRRP instance is not SMF instance.
> We can change http:apache2's start method to first create the VRRP
> instance to set up the VRIP and then start apache; similarly, in the
> stop method, destroy the VRRP instance after killing apache. Thus, if
> http:apache2 is killed by accident or disabled intentionally using
> "svcadm disable", the stop method is called to destroy the VRRP
> instance. (When killed by accident, SMF will probably call the start
> method to bring it online again. The time gap between VRRP instance
> destroying and recreation should be short enough so that the peer won't
> become MASTER.)

That's plausible, but it gets us back to the state where we have to
modify every service that might use VRRP so that it can invoke those
actions.  I'm not convinced that's the right plan, but if it is, then
what project will make all those modifications?

The start and stop methods of the existing services aren't considered
"administrative interfaces" for users to modify, are they?

> If we want to use a VRRP instance to protect multiple services that are
> using the same IP address, we can add the same "vrrpadm create" command
> to their start method, although only one of them will successfully
> create the instance (the other will fail with "object already exists").
> Similarly, all stop methods have the same "vrrpadm destroy" command,
> thus the first failure of these services will cause the fail-over.
> 
> This solution looks cleaner.

That's functional, but I'm not sure it's "cleaner."

The equivalent (in SMF terms) would be a 'require_all' dependency,
which seems quite a bit cleaner to me.

> Thus, VRRP instances can be divided into
> two types:
> 
> (1) To handle host failure. They're persistent; their configuration is
> stored in the configuration file. They will be created after system boot
> when svc:/network/vrrp:default is started.

What's the point in protecting an IP address for a system with no
services?  I don't think I understand the usage case.

> (2) To protect certain service(s). These should be temporarily created
> by the services. "Temporarily", because they're associated with the
> services, so should be created and stopped by the services. Their
> configuration are not stored int the configuration file and won't be
> started by vrrp:default.

... leaving that service nothing to do.

> >> Note, in most cases, people will use a real IP address that is on the 
> >> master host as the VRIP, so the application may run on the master. But 
> >> it's also possible that the VRIP is not actually on either host.
> > 
> > If it's not on either host, then what does VRRP do?
> 
> Sorry I should've stated it clearer.
> By "not on either host", I mean the VRIP is not a "real" IP address that
> is already configured on a host. When VRRP is in control, the VRIP will
> appear on one of the hosts.

OK.

> > Maybe I don't understand enough about the marketing requirements for
> > VRRP, but I thought that part of the point of VRRP was to protect the
> > service and not just the interfaces.
> 
> My understanding is that, VRRP is originally designed for routers
> (forwarding service), thus whether it can be used for other purposes
> really depends on the implementation. Yesterday I thought the VRIP can
> only appear on one host (according to the current design), so it's
> unable to protect a service; but if we can have VRIP configured on both
> hosts (although the address on the backup host is not seen by the
> outside world, as you suggested), VRRP can be used to protect services.
> Please correct me if I'm wrong.

I think the two are separate ideas.

The issue with protecting a service is that you want to try to make
sure that fail-over occurs if the service itself is damaged.  It's not
enough that the kernel is still running (and thus 'vrrpd' still
functions); the system has to be providing the requested service, or
there's no point in allowing that system to be Master.

If you don't do at least this, then I think it's questionable whether
VRRP really buys the user anything over having no protection at all,
or using something simple like two servers with separate addresses and
round-robin DNS with short TTL.  It introduces some new failure modes
(such as: service running fine on Backup, but Master has failed
service; user sees service outage) and generally adds complexity.

I think the severity of that issue is essentially the same, regardless
of whether the service you're protecting is forwarding or some
host-type service.

The second issue is whether VRRP can be extended in order to be used
to protect host-type services.  Doing that seems to require at least
two things:

  1. an option to allow local reception of packets on the non-owning
     system.

  2. a way to bind local applications to the virtual address.

The really new part is (1), which is the Accept_Mode flag out of the
I-D.  The other part is just a generalization: you need to allow
_some_ local reception in order to make NS/NA and RS/RA work, as well
as IPv4 RDISC.  The trick is to make that feature available to more
applications.

-- 
James Carlson, Solaris Networking              <[EMAIL PROTECTED]>
Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677
_______________________________________________
networking-discuss mailing list
[email protected]

Re: [networking-discuss] Design review for the VRRP project

Reply via email to