[crossbow-discuss] Crossbow Hardware Resources Management Design

Kais Belgaied Tue, 02 Oct 2007 16:09:13 -0700

Hi Deepti,

thanks for the review.
answers below

deepti dhokte - Sun Microsystems - Menlo Park United States wrote:
> Hi,
> Kais and Roamer,
>
> This is neat doc. I have few questions/comments.
>   

Thanks!

> 1)
> This documents describes polling single ring.
> Can you poll group of rings, if they all share common interrupt number?
>   

The issue when polling multiple rings is deciding which one to choose 
when some rings have
received packets and some are empty.
It's not possible to know the order of incoming packets to various 
rings, therefore we don't have
enough information to honor that order up in the stack.
We let the interrupt deposit the packets up to the SRS in this case, 
then a worker thread polls
from the SRS's queue at a rate prescribed by the bandwidth share.

> mrg_intr is described to be "nice to have" per group based common
> interrupt number, Is it driver dependent? or the mac framework
> can have virtual interrupt that masks individual interrupts of each
> individual ring of given group?
>
>   

It is both driver and system dependent.
The preference is to the finest level of granularity of course, which is 
an interrupt per ring.
If the device doesn't know how to generate a per-ring interrupt, or if 
the device driver failed to
allocate an MSI-X interrupt number for each ring, then it is expected to 
fall back to the next best granularity
which is per-group interrupt. If that fails, then interrupts can be 
shared between multiple groups.

> 2)
> If any hardware/network driver does not have ring support,
> can crossbow for such drivers emulate channel/Fifo/ring behavior in 
> software?
> Does SRS would serve that purpose?
>   

yes, SRS and ring members of an SRS will serve that purpose.
Note that the driver will expose one singleton group in that case.

> 3)
> I see there is mac_rx_ring_info_t and mac_rx_ring_group_info_t.
> how about if you have common structures for rx and tx side for info?
> Instead of having  mac_rx_ring_info_t and mac_rx_ring_group_info_t
> would it make sense to have mac_ring_info_t and mac_ring_group_info_t
> to be usable for rx and tx side rings or ring-groups.?
>   

mac_rx_ring_info_t and mac_tx_ring_info_t

are objects of different nature. Different functions act on them. The 
actions are different,
and the arguments are different. Roamer and I discussed this quite a bit 
during the design,
and it didn't feel natural to force a communality of the types on them 
just for the sake of having
compact code.
We do have a common mac_capab_rings_t on the other hand, because that 
object is used the same way
for both rx and tx direction, simply for exchanging the opaque handles 
for rx and tx rings, and
pointers to their more specific info structs. We opted for type 
communality in that case.
The first paragraph of the Provider Interface section was an attempt to 
capture that rationale.

> e.g. To implement above you can have mac_cb function pointer
> in mac_ring_info_t , and say -
> 1) for rx side  "mac_cb" can be initialized as "mr_poll" and
> 2) on Tx side "mac_cb" can be initialized as "mr_send"
> since mr_driver, mr_intr, mr_start, mr_stop are members of
> mac_rx_ring_info_t as well as mac_tx_ring_info_t and it's just that
> mr_poll and mr_send routines are different for rx and tx side ring_info
> respectively.
>
>
> 4)
> AFAI understand these hardware resource capabilities can help do
> load balancing/packet classification , how it can help virtualization?
>   

good question. The ability to split traffic into independent lanes helps 
sharing access to the
hardware resources in an isolated manner. When you have a ring group 
that has its own
MAC address and interrup(s), you get to assign that interrupt to a CPU 
that was given
to a virtual machine. That's isolation in terms of scheduling resource, 
because even an avalanche
of interrupts targeting that VM's address will have little effect on CPU 
resources allocated to
others. On the transmit side, the core MAC framework will be submitting 
packets to the
right tx ring associated with a specific MAC client (e.g. a VNIC given 
to a VM), and not
using other clients tx rings.

I think some elaboration is needed in the text here.
> cause, As I see, virtual machines are identified using MAC+IP addresses,
> Is there any userland utilities that can help steer, classify and 
> administer
> VM's traffic and steer across multiple rings by programming  policy/rule on
> ring/s?
>   

yes, at the end of the day, flowadm(1m) that may result in programming 
the hardware classifier
for steering based on a rule (e.g. IP addr or port) or policy (hash 
function).

> Is so, what is it and how user can enforce a policy dynamically on given
> set of rings or ring groups?  I know flowadm can program ring
> but can it program ring-group?
>   

the generalized load balancing policy (generalized from the existing 
aggr policy) is currently
the only way to alter the behavior at the level of the ring group, and 
that's using dladm(1m).

> 5)
> Can you group rings of different physical NICs, If yes, what is the 
> interface
> for the same?
>   

I need to think about this one. The scope of the question is actually 
how to make the aggr driver
work efficiently and best utilize the virtualization capabilities of its 
members.
Maybe we can have an open crossbow design meeting about this, if members 
of this audience wish to
participate.

Thanks,

    Kais
> -Deepti
>
>

[crossbow-discuss] Crossbow Hardware Resources Management Design

Reply via email to