[crossbow-discuss] Updated Crossbow virtualization architecture document

James Carlson Thu, 30 Aug 2007 15:01:31 -0400

Nicolas Droux writes:
> James Carlson wrote:
> > I can understand wanting to set some initial properties at create
> > time, but it seems odd that the new general properties are segregated
> > into VNIC-specific commands.
> 
> No, only set-linkprop will be used to change these properties, not 
> modify-vnic. We'll send out updated man pages to reflect these changes, 
> and they will be different than the man pages that were published as 
> part of our current bits.


OK; thanks.

> > To put the question in another way: suppose I have a non-IP protocol
> > using a VNIC with a bandwidth control set on it.  What happens?  Are
> > there features that were related to squeues that I won't be able to
> > use?  If so, then what are those features?
> 
> The client will see a MAC which has a bandwidth limit, nothing else is 
> required.

That's what I wanted to know.

> > If I (as a system administrator) say "factory" as part of the
> > configuration of the interface, then I'd expect to get a factory-
> > supplied address.  My expectation would be that when the factory-
> > supplied components are swapped out underneath, the address changes.
> 
> Actually there are three sub-cases to this I think:
> 
> 1. If the administrator does not specify an address (automatic 
> assignment), and a factory MAC address is assigned to the VNIC. In this 
> case, I think it's fine to assign a different MAC address, e.g. a random 
> one, to the VNIC if the VNIC is moved to a NIC which does not have 
> available factory MAC addresses.

Yes, I agree with that.  That'd be the "auto" case, and I was talking
about "factory."

> 2. If the administrator requested a factory MAC addresses explicitly, 
> then the VNIC could be moved to a different NIC which has an available 
> factory MAC address. Otherwise the operation would fail unless a force 
> flag is set.

Why would "force" be useful in this case?  What exactly happens if the
operation is "forced," and why couldn't I configure the interface in
that way in the first place?

I'm very leery of administrative options that leave the system in a
state where I couldn't have configured it that way in the first
place.  In this case, I can't configure a VNIC as "factory" if there
are no addresses available, but I can "force" a "factory" VNIC into an
interface with no addresses available.

Does the configuration pop loose (become something other than
"factory") during such a forced move, or does the configuration just
become incorrect, saying "factory" but meaning something else?

> 3. If the administrator requested a factory MAC address of a specific 
> slot, then there's a clear intent of using a specific MAC address of the 
> device underneath. In that case the move operation would fail unless a 
> force flag is set.

I still think this is too divorced from administrative expectations.

When do I move VNICs around and what do I need and expect?  I think
this document should work through some actual usage scenarios and then
come up with usable interfaces based on that, because the current
interfaces seem to be self-referential: they do what they do because
that's what they do.  The "force" flag seems particularly problematic,
as it indicates that things the administrator should be able to do
aren't doable.

The scenarios I can see are:

  - User configures VNIC for the first time on a given NIC.  What
    happens when the "factory" address desired doesn't exist or is in
    use?

  - User wants a VNIC to move from one NIC to another.  Forget about
    "forcing" the operation, and look at the need.  Why am I moving it
    from one to another and what should I expect?

  - The system needs to move a VNIC from one NIC to another (or to
    none at all!) due to DR removal of the assigned NIC.

There might be other variations here.

Here's one possible answer that I think would make a bit more sense,
at least to me, and would be much simpler.

     The "auto" keyword and the "-F" flag go away.  All configurations
     that specify "factory" are implicitly automatic: if the requested
     factory address isn't available, then you get an auto-generated
     one and perhaps a warning message.  If you really care which kind
     of address you get, then look at the MAC address -- it'll have
     the "local" flag set if it was auto-generated.

     When moving from one interface to another, if "factory" is
     selected, it's the same as configuring the interface for the very
     first time.  If the requested address is available on the new
     (destination) NIC, then it's used.  If it's not, then an auto-
     generated address is used instead.

     The system doesn't have a way to be obstinate about using a
     particular factory-assigned address, and failing otherwise.  If
     you need to have a never-changing address, then assign one
     manually or use the "random" option, as neither of these options
     relies on data supplied by the hardware itself.  Factory
     addresses are, by definition, "ephemeral" from the point of view
     of a VNIC -- they're tied to the hardware, not to the VNIC.

> > Having the factory-supplied address come unmoored from the device
> > itself seems odd to me, and almost certain to cause trouble.  I
> > suppose it could be possible to create a "adopt the factory address
> > and treat it as though it were my own statically-configured address"
> > option, but I'd certainly want to see it come with adequate warnings
> > about the dangers and a clear user interface (not "factory" but
> > "steal-from-factory" ;-}).  I'm not sure that it'd be administratively
> > interesting, though.
> 
> Yes, there's a risk of duplicate addresses if that option was chosen, 
> and the source NIC ends-up being recycled later, that's less than ideal.

Actually, it's potentially a disaster if it happens.  If moving
factory addresses around among NICs is actually an important
administrative requirement, then, in terms of ARC review, I'd feel
TCR-strong that the system _must_ prevent duplicates from forming
somehow.  Or just not include that feature.

> > I've seen similar schemes for access servers (most have proprietary
> > RADIUS extensions for setting bandwidth limits), and the usual way
> > this works is that once the link is saturated, the configured limits
> > become shares.  Thus, the clients are all hurt in proportion to the
> > amount of bandwidth they're given.
> 
> The limits are really used to clamp down on bandwidth utilization by a 
> MAC, but they do not imply any guaranteed bandwidth. As a future 
> deliverable we're also planning to provide bandwidth guarantees which is 
> what you seem to be referring to here.

Actually, no, that's not quite what I'm referring to.

A bandwidth limit is an upper bound.  If the user tries to send more
than that, then he'll experience delay and loss.  There's no guarantee
that he'll be able to send that much, but he won't be able to send
more.

A bandwidth guarantee is a lower bound.  It's a reservation.  The user
must always be able to get at least a given amount.  This project
doesn't supply guarantees.

Quite apart from those definitions, though, is the issue of fairness.
In this case, I *am* talking about limits, but I'm also talking about
what happens when the limit is unachievable.  In the implementations
I've seen (Cisco and Ascend are pretty good references for this), the
limit becomes a share because this sort of behavior preserves
fairness.

Suppose we have twenty users with 10Mbps limits, and one user with a
50Mbps limit.  They're all on a 100Mbps pipe.  If ten of those 10Mbps
users can together lock out all of the others from using any of the
pipe bandwidth at all, then that's an "unfair" result.

A very simple, but "fair," result would be that, in the limit with
everyone sending flat-out, the 50Mbps-limited user would get 20% of
the bandwidth, or 20Mbps.  The 10Mbps users would get the remaining
80%, or 4Mbps.  Thus, each user would end up with 40% (which is
100/250 and 20/50 and 4/10) of his maximum.

Other results are possible, including splitting the various kinds of
users into priority classes.  I assume that's not what's going on
here, though it's not clear.

The point is that, although the answer could just be that it's
inherently unfair, and that's how it is, I don't see how an inherently
unfair system is something that people could use in practice.  Does it
make sense to do that?

> > You're going to need a function to change the values after mac_open()
> > time.  By supplying the same values during mac_open(), you're just
> > duplicating that functionality.
> 
> It might be a single "piece of code" which can be called to allocate 
> resources according to these parameters from both the open and modify 
> functions. I think the duplication can be avoided.

Then the duplication is only in the API.

> > Why is the resource allocation itself an important thing to optimize
> > versus the interface stability and scalability?
> 
> I don't agree with the "core function" vs "periphery" argument. The 
> resource control is becoming an integral part of the MAC layer, and 
> there shouldn't be a need to do "extra steps" to enable that functionality.

Opening the device is clearly core functionality -- you can't do much
if you can't open it.  It sounds like you agree that if those
arguments weren't present, then some "default" set of resources would
need to be allocated.  Thus, I argue that the functionality isn't core
to the goal of getting access to the mac layer.

So, the disagreement is on whether every consumer needs to set up
resource controls.  I'm not sure that they do.  But if they do, aren't
there other things they also "need to" set up, and should all of those
things be mac_open() arguments?

> But I agree with your point about designing an API which allows more 
> options to be added in the future without breaking backward 
> compatibility. However I think this can be made to work without 
> requiring a separate call. I'll need to take a closer look at this.

OK.

> > The document says it must be -1.
> 
> Yes, and I need to fix the document to allow a slot number to be passed 
> when that MAC address type is specified.

OK.

> > It's also duplicate logic.  Why optimize for system call counts versus
> > kernel code complexity?
> 
> There's additional code in the kernel, but that logic is very simple.

I'll give up on this point.  I don't think the duplication is
worthwhile, even if it's "simple," as this sort of thing often leads
to trouble when alternate policies are devised, but it's something
hidden in the implementation that can be ripped back up later if
necessary.

> Yes, this will be of course fully documented. If we find an efficient 
> way to do banwidth control across multiple rings in the future, I don't 
> see why we wouldn't be able to made use of that functionality.

Not just "fully documented," but the design constraint around the
units of control (being individual NIC instances) needs to be clearly
described.  Maybe I'm atypical but, as a user, this wouldn't be
obvious to me.

-- 
James Carlson, Solaris Networking              <james.d.carlson at sun.com>
Sun Microsystems / 1 Network Drive         71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

[crossbow-discuss] Updated Crossbow virtualization architecture document

Reply via email to