Hi Deepti, thanks for the review. answers below
deepti dhokte - Sun Microsystems - Menlo Park United States wrote: > Hi, > Kais and Roamer, > > This is neat doc. I have few questions/comments. > Thanks! > 1) > This documents describes polling single ring. > Can you poll group of rings, if they all share common interrupt number? > The issue when polling multiple rings is deciding which one to choose when some rings have received packets and some are empty. It's not possible to know the order of incoming packets to various rings, therefore we don't have enough information to honor that order up in the stack. We let the interrupt deposit the packets up to the SRS in this case, then a worker thread polls from the SRS's queue at a rate prescribed by the bandwidth share. > mrg_intr is described to be "nice to have" per group based common > interrupt number, Is it driver dependent? or the mac framework > can have virtual interrupt that masks individual interrupts of each > individual ring of given group? > > It is both driver and system dependent. The preference is to the finest level of granularity of course, which is an interrupt per ring. If the device doesn't know how to generate a per-ring interrupt, or if the device driver failed to allocate an MSI-X interrupt number for each ring, then it is expected to fall back to the next best granularity which is per-group interrupt. If that fails, then interrupts can be shared between multiple groups. > 2) > If any hardware/network driver does not have ring support, > can crossbow for such drivers emulate channel/Fifo/ring behavior in > software? > Does SRS would serve that purpose? > yes, SRS and ring members of an SRS will serve that purpose. Note that the driver will expose one singleton group in that case. > 3) > I see there is mac_rx_ring_info_t and mac_rx_ring_group_info_t. > how about if you have common structures for rx and tx side for info? > Instead of having mac_rx_ring_info_t and mac_rx_ring_group_info_t > would it make sense to have mac_ring_info_t and mac_ring_group_info_t > to be usable for rx and tx side rings or ring-groups.? > mac_rx_ring_info_t and mac_tx_ring_info_t are objects of different nature. Different functions act on them. The actions are different, and the arguments are different. Roamer and I discussed this quite a bit during the design, and it didn't feel natural to force a communality of the types on them just for the sake of having compact code. We do have a common mac_capab_rings_t on the other hand, because that object is used the same way for both rx and tx direction, simply for exchanging the opaque handles for rx and tx rings, and pointers to their more specific info structs. We opted for type communality in that case. The first paragraph of the Provider Interface section was an attempt to capture that rationale. > e.g. To implement above you can have mac_cb function pointer > in mac_ring_info_t , and say - > 1) for rx side "mac_cb" can be initialized as "mr_poll" and > 2) on Tx side "mac_cb" can be initialized as "mr_send" > since mr_driver, mr_intr, mr_start, mr_stop are members of > mac_rx_ring_info_t as well as mac_tx_ring_info_t and it's just that > mr_poll and mr_send routines are different for rx and tx side ring_info > respectively. > > > 4) > AFAI understand these hardware resource capabilities can help do > load balancing/packet classification , how it can help virtualization? > good question. The ability to split traffic into independent lanes helps sharing access to the hardware resources in an isolated manner. When you have a ring group that has its own MAC address and interrup(s), you get to assign that interrupt to a CPU that was given to a virtual machine. That's isolation in terms of scheduling resource, because even an avalanche of interrupts targeting that VM's address will have little effect on CPU resources allocated to others. On the transmit side, the core MAC framework will be submitting packets to the right tx ring associated with a specific MAC client (e.g. a VNIC given to a VM), and not using other clients tx rings. I think some elaboration is needed in the text here. > cause, As I see, virtual machines are identified using MAC+IP addresses, > Is there any userland utilities that can help steer, classify and > administer > VM's traffic and steer across multiple rings by programming policy/rule on > ring/s? > yes, at the end of the day, flowadm(1m) that may result in programming the hardware classifier for steering based on a rule (e.g. IP addr or port) or policy (hash function). > Is so, what is it and how user can enforce a policy dynamically on given > set of rings or ring groups? I know flowadm can program ring > but can it program ring-group? > the generalized load balancing policy (generalized from the existing aggr policy) is currently the only way to alter the behavior at the level of the ring group, and that's using dladm(1m). > 5) > Can you group rings of different physical NICs, If yes, what is the > interface > for the same? > I need to think about this one. The scope of the question is actually how to make the aggr driver work efficiently and best utilize the virtualization capabilities of its members. Maybe we can have an open crossbow design meeting about this, if members of this audience wish to participate. Thanks, Kais > -Deepti > >