Jim, Thanks for the comments.
James Carlson wrote: > Nicolas Droux writes: >> http://opensolaris.org/os/project/crossbow/Docs/crossbow-virt.pdf > > I have A few questions about this. I've also read through as much of > the crossbow-discuss archives as seemed to be related to these > topics, and didn't find answers there. > > - Why are bandwidth, CPU control, and MAC address assignment > exclusively a VNIC feature, at least at the administrative level? > Section 4.7 seems to say that MAC instances will get these > features, so shouldn't this be "modify-dev" instead? Bandwidth control, CPU mapping, fanout are not exclusive to VNICs. They will be expressed as properties, and applicable to non-VNIC data-links as well. This will be described in details by another upcoming document. I'll see what I can do to make that clearer in the virtualization document I sent out for review. From the administration interface point of view, there are two ways to associate properties with data-links. For data-links that are created through a dladm subcommand like create-vnic, the initial set of properties can be specified during the creation of the data-link itself through an dedicated option. In addition the properties can be set on any data-link through the set-linkprop subcommand. The former allows the administrator to create a VNIC with bandwidth control in a single command instead of having to go through a two step dance. > > Needing to create a "dummy" VNIC on top of a regular interface > just to interpose these new features seems like an implementation > artifact. No, that won't be needed, see above. > > - I assume we need a redesign of the VLAN code in order to get > per-VLAN bandwidth control. Is that redesign part of Crossbow, or > is it some later project? In reading the archives, it seems that > it's been proposed as part of Crossbow, but in reading this > document it seems to be part of something else. Yes, we're currently planning to move VLAN processing down to the MAC layer itself, and the VLAN processing currently in the DLS layer will be removed. This still needs to be properly documented. > > - If per-VLAN control appears, do the units of administration > change? Does it then become reasonable to talk about bandwidth > and CPU control using "set-linkprop"? Yes, the properties will apply to VLAN data-links as well, see above. > > - Do bandwidth and CPU controls rely on squeues? If so, then VNICs > may not be able to control utilization from non-IP traffic, such > as with bridging. There is a level of bandwidth control done by squeue, but there's also a bandwidth control done by the MAC layer itself. Which is useful when there's a need to do bandwidth control before fanout to multiple CPUs at the MAC layer, and also for non-IP protocols, or when the MAC is being used by a virtual machines back-end drivers in the host OS. See also Sunay's writeup at http://www.opensolaris.org/os/project/crossbow/Design_softringset.txt for more details on this topic. > - I'm not sure I understand the (undocumented? -- not in summary) > "-F" option for move-vnic. If I'm using a factory address on one > NIC and I move a VNIC to another NIC, does this cause the VNIC to > continue using the _same_ address but just on a new NIC? > > If so, how is duplication avoided if that factory address is ever > reused from the original NIC? > > I would have expected that a VNIC using a factory address would > just get a *new* address during a forced move to a new NIC. > Changing MAC address during reconfiguration doesn't seem like a > disaster to me -- in fact, it seems expected. Why should it try > to retain the address? I was trying to allow the system administrator to minimize the impact on the existing MAC address assignment when moving a VNIC to be moved off and back to a device. But I agree that it's not optimal. If the folks on this list feel that the MAC address changing is not an issue, I've no problem using the simpler scheme of reassigning a new MAC address to the VNIC/MAC client. > - For showing statistics with "show-vnic -s", are these the same as > "show-link -s"? If so, wouldn't the existing "show-link -s" do > the job? Agreed, show-link -s should do fine here. > - What do "up" and "down" mean? Are these equivalent to controlling > the "RUNNING" bit from user space (i.e., some way of marking link > up and link down manually)? Or are they something else? Should > regular MAC instances (other than VNICs) have the ability to be > set administratively up and down? > > What would happen if VNICs were always "up?" Here it means causing the VNIC MACs to register with the framework. The same functionality already exists for link aggregations. Meem suggested init-vnic instead, which would be fine to me and avoid potential confusions with ifconfig up. I still need to update that part of the document. > > - What happens if a NIC is oversubscribed by the amount of bandwidth > configured for the VNICs? Is the result proportionate (and thus > "fair") allocation, or do they compete on some other grounds? In that case it will depend on other factors such as the type of traffic, the CPU(s) processing that traffic, etc. > > What kind of bandwidth control exists here? How granular is it, > and what effects do clients see from restricted bandwidth? Are > packets dropped (they have to be, if bandwidth limits apply to > forwarded traffic)? If so, is it tail drop or something more > sophisticated? In general if a SRS or flow is assigned its own hardware ring, then the polling thread will poll packets directly from the ring, and there's no dropping from the host. Packets will be polled from the rings when allowed as per bandwidth limits and consumption. The polling thread is scheduled every tick, and we compute a maximum number of bytes per tick. If more than one SRS/squeue share a ring, there's no polling of the ring. Instead, traffic will be interrupt driven, and packets will be deposited on queues associated with the SRS/squeue. Packets are then pulled from these queues based on bandwidth limits. If the maximum number of packets in these queues is exceeded, then there's tail drop. Again, see the SRS design doc. > - Can a VNIC be built atop another non-anchor VNIC? (Seems like the > answer is "yes.") Correct. > > - When VNICs share rings due to a lack of hardware resources, what > happens when the client of one VNIC is using polling and the > client of the other one is not? > Won't one client end up blanking the interrupts for another? If there's one ring shared by multiple VNICs, traffic arrival will be interrupt based, and after software classification, traffic will be deposited to software rings. If there are multiple hardware rings but only one interrupt, then the driver does not disable the hardware interrupt. Instead, it takes note of the request from the stack to not interrupt for specific rings. When a hardware interrupt is received, it avoids consuming packets from these rings, and continues delivering traffic to the MAC layer otherwise. Again, see the document on SRS and bandwidth control for more details. > - Instead of adding more arguments to mac_open() to handle priority > and bandwidth, I'd suggest making these separate calls. You'll > need the separate call anyway to implement the "modify" mechanism. Having the parameters specified in mac_open() is useful since they allow these parameters to be specified when the resources are allocated to the MAC client. This avoids allocating a set of default resources and then immediately changing these resources through a separate modify mechanism. If we can specify through 2-3 arguments I don't think this should be an issue. > - What exactly does exclusive MAC access do? If mac_exclusive_set > is called, are other client requests blocked (sleeping)? Or are > they rejected (return error)? Or are they just let through, and > all clients are expected to bracket requests with exclusive > set/clear calls? This is basically the equivalent of the mac_active_set()/mac_active_clear() we have in Nevada today. I'm looking into whether the same semantics could be implemented indirectly through the mac_unicst_set() with the primary MAC address, since there's only one and it can be assigned only to one MAC client. > > - MAC_UNICAST_AUTO seems unnecessary to me. Why not just call first > with MAC_UNICAST_FACTORY and, if that fails, call again with > MAC_UNICAST_RANDOM? Doing that would even have better > functionality as MAC_UNICAST_AUTO seems to omit the possibility of > desiring a particular factory address when available. The intent was for AUTO to allow the slot to be specified. That option should allow the slot number to be specified via addr_slot. > I think having MAC_UNICAST_AUTO in the mix ends up pushing some of > the control-path complexity out of the user space and into the > kernel. It'd be better to simplify the kernel parts. This is very simple logic we're talking about here, I don't see the problem doing that selection in kernel space. In addition, it avoids having two system calls per VNIC created on top of NICs which do not provide multiple factory MAC addresses. > - What sorts of privileges are required to create and administer > VNICs? Are these things that can be delegated to non-global > zones? Basically the same that are needed for administrating other data-links, i.e. sys_net_config and net_rawaccess. In a zones environment data-link administration is limited to the global zone. > - Why is [V]NIC the right level of bandwidth control? If I want to > give a zone 100Mbps worth of bandwidth, but I'm giving it multiple > VNICs, how do I do that -- can the bandwidth control logic do > accounting based on multiple interfaces (aggregate control, rather > than individual interface control)? No, the bandwidth control is on a per-interface on a per-flow basis. This is because the bandwidth is basically controlled by polling on a per ring (software or hardware) basis, not across a set of rings. > If I have application-level controls, such as HTTP virtual servers > or a sendmail configuration handling multiple domains, how can I > control bandwidth for those things? Won't the application need to > be involved? Then you will use flowadm(1M) which we are also introducing as part of Crossbow, and will be described separately. My document focuses on the virtualization aspects of the project. Nicolas. -- Nicolas Droux - Solaris Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux