On 10/27/08 10:10, James Carlson wrote:
yifan xu writes:
The VRRP project team invites you to review our current design. The document 
could
be found at:

http://www.opensolaris.org/os/project/vrrp/vrrp_design.pdf

I'll skip over most of the SMF instance discussion now happening on
[EMAIL PROTECTED], but I think that thread needs to be redirected
here, so that the networking community can comment on it.  The open
review isn't complete without it.  (You have my permission to resend
any of my messages on that topic to the list if you need.)

One high-level issue: this document appears to be mostly a functional
description of the new service and the library.  Will there be a
separate detailed design document to discuss the internal data
structures, algorithms, and any locking or lifetime issues, or is
section 7 intended to cover that as well?  (Or is it simple enough
that no formal design is needed?)

I don't think I understand VRRP groups.  The design describes in some
detail how to configure them and some of how they operate, but it
leaves out three crucial bits of information:

  a.  Why do we need them?  What problem do they solve or what
      necessary feature do they allow us to have?

  b.  What does it mean for a group to be in master or backup state?
      What things are true in each case?  An example that includes a
      simple configuration with two instances in a group and walks
      through a complete falk-over event would be very useful.

      (It appears that if one instance within a group of several fails
      over, then the "group" on multiple systems will be in state
      "Master" at the same time, because each of these systems will
      have an instance that's still in "Master" state.  Is that
      intentional?)

  c.  Do the instances within a group still behave independently?  In
      other words, if I have to fail over one interface within a
      group, does that fail-over still occur?  Or does group
      membership affect individual instance behavior in some way,
      preventing fail-over of an address when it otherwise would have
      happened?

In looking through other implementations, it seems that nobody else
has these higher-level group structures.  Perhaps more importantly,
the term "VRRP group" already has a defined meaning in the industry:
it's a set of routers managing a virtual IP address.  It's what you're
calling an "instance."  Exposing the higher-level structures to users
as "groups" sounds like something that will be confusing -- because
"group" doesn't mean here what it means everywhere else.

(I suspect that if VRRP is factored out into SMF properly, then the
existing "group" notion goes away, and it becomes just an
"optional_all" dependency for some other service that cares about VRRP
state.  Similarly, the startup/shutdown calls become redundant with
the vrrp service state.)

The VRRP group concept was devised to handle the failover of primary loadbalancer that handles multiple virtual services. Rather than designate master and backup per virtual service, the idea was to have one master for all the virtual services. So one would invoke a VRRP instance per virtual service and bind all the the instances to 1 group and perform failover as a whole.The failover from primary to standby will only occur when the standby fails to hear VRRP advertisements of *all* instances in that group.
Does vrrpd itself send RA messages? If so, why? Shouldn't that be
the responsibility of in.ndpd?  (The RFC describes what behaviors are
required for the whole system, but doesn't specify which daemon needs
to take the required actions.)  Section 7.6 doesn't describe how the
required behavior will be achieved by the project.

How does the daemon interact with services that need to be protected?
If I'm using VRRP for fail-over in the host-type application scenario,
I'd expect that if the application fails or is administratively
disabled, VRRP would trigger a fail-over to the backup.  How does that
happen?  If I'm using it for the router-type scenario, then loss of
the routing daemon(s) or disabling forwarding should cause VRRP to
stop.

At a guess, it appears that vrrpd is expecting those services to call
"vrrpadm startup" when launched and then call "vrrpadm shutdown" when
they fail.  If that's correct, then where's the rest of this project?
What project actually inserts those calls into the services that can
be protected with VRRP?  Without that linkage, this daemon won't
actually do anything in a production system.

What external things should happen when VRRP changes state?  What's
the expected usage case for the registered shell scripts?  What
problem do these scripts solve?  Examples would be very helpful here.

Section 4.1: the description of the 'INITIALIZE' state appears to be
incomplete.  The fact that the RFC describes the state doesn't really
explain how to implement it; "daemon not running" could also be a
plausible representation, and nothing here explains why you'd choose
one implementation over another.

Worse, the text warns of unspecified problems in operation if all of
the instances are not started at one time, and then presents a
physically impossible solution: starting them all at once violates at
least causality.  The necessity of doing this requires some
explanation.  Are you really saying that (for example) if one system
has its power removed, then the other cannot be rebooted?  That's what
I think is implied here, and it doesn't make sense to me.

How does a protocol that is intended to solve high-availability
problems end up having complicated failure modes related to
availability?

Section 5.6: this part begs a number of questions:

  a. Why are separate registrations needed?  In other cases where
     we've done this (DHCP eventhook, PPP ip-up), we've simply created
     a single well-known path that is invoked by the daemon, and if
     there's a user-supplied script there, it runs.

  b. How can a shell script be "unregistered?"  There's a registration
     function, but the document doesn't describe how anything is
     unregistered.  (Empty string for the 'file' argument?)

  c. What happens if an existing vrrp_state_trans_t value is set to a
     new script?  Does the system support multiple scripts per state
     transition value, or just one?

  d. Why isn't the state transition value just an argument to the
     script?  Why have separate scripts per transition value?  What's
     the usage model?

  e. Why not just specify a set of arguments that will be presented to
     the script?  I don't understand the value in allowing the client
     to specify its own command line.

Section 7.1: If I have multiple copies of vrrpd running at once, which
3.2 appears to encourage, then where are the UNIX datagram socket
control interfaces for each one placed?  This section documents only
one file path.

If I have multiple copies of vrrpd running, how do I specify which one
I want to talk to via libvrrp and via the CLI?

If I can't have multiple copies of the daemon running at once, then
what's the point of section 3.2?  Why document a means for the user to
invoke with different command line arguments?

Section 7.2: can you explain in more detail what IP packets you need
to send that cannot be sent using the existing IP interfaces on
Solaris?  Rather than resorting to DLPI, is it possible to fix IP to
allow you to do what you need?  We've done this for DHCP, and it'd be
nice to have VRRP 'fixed' before integration.  (For what it's worth, I
think the reference to PF_PACKET is likely out of place here.  Solaris
uses DLPI where Linux would use PF_PACKET, but it's unclear to me
whether and wny VRRP needs either of these.)

Using DLPI outside the context of ifconfig is a bit problematic, as it
inhibits DR.

Section 7.3: this part talks about command line arguments specifying
interface information.  If the daemon itself supports multiple
instances at one time, how do those command line arguments work?

Section 7.5: if Accept_Mode is False (the default), then how do you
prevent the non-owner from accepting and processing packets sent to
the virtual IP address?  It looks like what you're doing is setting
static ARP/ND entries and leaving out the VNIC configuration.  Is that
correct?  If so, then what happens when we get one of those packets?
Won't we be tempted to send an improper ICMP Redirect?

Nit: in the last paragraph in this section, I think that "VRIP on the
physical interface" actually refers to the static ARP/ND entries, and
not any physical interface configuration.  More importantly, nobody
should be switching between "owner" and "not owner" except by explicit
configuration commands, so the described switch shouldn't be needed.
It's not part of fail-over.

Section 7.6: how are these changes made?  Does something in the system
itself need to change, or is this just an administrative configuration
issue that needs to be documented for users?

Section 7.7: libinetutil includes functions for managing timers and
sockets.  Would it be possible to take advantage of this library?

Section 7.7: as a design issue, using a separate thread to implement
timers sounds messy to me.  Why is this needed?  The usual way this is
done with poll(2) is to keep the timers on a sorted list, so that you
can always look at the soonest-to-expire and use that for the timeout
parameter.  What does a separate thread do differently?

Section 9.1: nit: I'm assuming this text means that the privilege
enforcement occurs within the daemon.

Section 9.1: what does "super-user privilege" mean here?  Does it mean
"all privileges?"  (This case goes away if 'register' isn't needed
because a fixed script location is sufficient.)


_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to