Kais Belgaied wrote:

>Hi Deepti,
>
>thanks for the review.
>  
>
you are welcome!

>answers below
>
>  
>
>>1)
>>This documents describes polling single ring.
>>Can you poll group of rings, if they all share common interrupt number?
>>  
>>    
>>
>
>
>The issue when polling multiple rings is deciding which one to choose 
>when some rings have
>received packets and some are empty.
>  
>
aha, so which ring should be drained in a group is a issue.

But the rate at which these 10GBe nics are loaded, in
data-centers, I think the common case would be that
we have some packet always on each ring of the group.

>It's not possible to know the order of incoming packets to various 
>rings, 
>
so out of order packets can be a issue, I agree.

>therefore we don't have
>enough information to honor that order up in the stack.
>  
>
can we drain the rings of a group  in round-robin fashion, such as
we remember the last rings drained and while serving the same
interrupt number of the group, next time we pick/drain packet off
of next ring of the same group 1,2,3,4,1,2, etc.?
but, yeah that won't solve out-of-order packets issue,
I would like to brood a bit on this.

>We let the interrupt deposit the packets up to the SRS in this case, 
>then a worker thread polls
>from the SRS's queue at a rate prescribed by the bandwidth share.
>  
>
This is neat.

>  
>
>>mrg_intr is described to be "nice to have" per group based common
>>interrupt number, Is it driver dependent? or the mac framework
>>can have virtual interrupt that masks individual interrupts of each
>>individual ring of given group?
>>
>>  
>>    
>>
>
>It is both driver and system dependent.
>The preference is to the finest level of granularity of course, which is 
>an interrupt per ring.
>  
>
I think, that would be complex and I don't see how useful.?
cause, purpose of grouping would be to load-balance same traffic.

although, I see your point that as we don't have correct way of knowing
which ring of the group got packet to be drained up, individual interrupt
per ring is the finer knowledge.
But again as per my above comment in earlier paragraph , it's likely that
each ring of the group would always have some packet
also, grouping will be used in heavy network situations.

>If the device doesn't know how to generate a per-ring interrupt, or if 
>the device driver failed to
>allocate an MSI-X interrupt number for each ring, then it is expected to 
>fall back to the next best granularity
>which is per-group interrupt. If that fails, then interrupts can be 
>shared between multiple groups.
>
>  
>
per-group interrupt is something I get convniced about.
sharing same interrupts across multiple groups (each group with
multiple rings) seems bit un-useful to me.

2)

>>If any hardware/network driver does not have ring support,
>>can crossbow for such drivers emulate channel/Fifo/ring behavior in 
>>software?
>>Does SRS would serve that purpose?
>>  
>>    
>>
>
>yes, SRS and ring members of an SRS will serve that purpose.
>Note that the driver will expose one singleton group in that case.
>
>  
>
great.

>>3)
>>I see there is mac_rx_ring_info_t and mac_rx_ring_group_info_t.
>>how about if you have common structures for rx and tx side for info?
>>Instead of having  mac_rx_ring_info_t and mac_rx_ring_group_info_t
>>would it make sense to have mac_ring_info_t and mac_ring_group_info_t
>>to be usable for rx and tx side rings or ring-groups.?
>>  
>>    
>>
>
>
>mac_rx_ring_info_t and mac_tx_ring_info_t
>
>are objects of different nature. Different functions act on them. The 
>actions are different,
>and the arguments are different.
>
I may be missing something.
I think mac_rx_ring_info_t and mac_tx_ring_info_t differ just in last 
member.

> Roamer and I discussed this quite a bit 
>during the design,
>and it didn't feel natural to force a communality of the types on them 
>just for the sake of having
>compact code.
>We do have a common mac_capab_rings_t on the other hand, because that 
>object is used the same way
>  
>
yeah, that's why I wondered why ring_info_t can not be same for TX and 
RX side?

>for both rx and tx direction, simply for exchanging the opaque handles 
>for rx and tx rings, and
>pointers to their more specific info structs. We opted for type 
>communality in that case.
>The first paragraph of the Provider Interface section was an attempt to 
>capture that rationale.
>
>  
>
yes, that's why I got into thinking why not same ring_info_t.

>>e.g. To implement above you can have mac_cb function pointer
>>in mac_ring_info_t , and say -
>>1) for rx side  "mac_cb" can be initialized as "mr_poll" and
>>2) on Tx side "mac_cb" can be initialized as "mr_send"
>>since mr_driver, mr_intr, mr_start, mr_stop are members of
>>mac_rx_ring_info_t as well as mac_tx_ring_info_t and it's just that
>>mr_poll and mr_send routines are different for rx and tx side ring_info
>>respectively.
>>
>>
>>4)
>>AFAI understand these hardware resource capabilities can help do
>>load balancing/packet classification , how it can help virtualization?
>>  
>>    
>>
>
>good question. The ability to split traffic into independent lanes helps 
>sharing access to the
>hardware resources in an isolated manner. When you have a ring group 
>that has its own
>MAC address and interrup(s), you get to assign that interrupt to a CPU 
>that was given
>to a virtual machine. That's isolation in terms of scheduling resource, 
>because even an avalanche
>of interrupts targeting that VM's address will have little effect on CPU 
>resources allocated to
>others. On the transmit side, the core MAC framework will be submitting 
>packets to the
>right tx ring associated with a specific MAC client (e.g. a VNIC given 
>to a VM), and not
>using other clients tx rings.
>
>I think some elaboration is needed in the text here.
>  
>
>>cause, As I see, virtual machines are identified using MAC+IP addresses,
>>Is there any userland utilities that can help steer, classify and 
>>administer
>>VM's traffic and steer across multiple rings by programming  policy/rule on
>>ring/s?
>>  
>>    
>>
>
>yes, at the end of the day, flowadm(1m) that may result in programming 
>the hardware classifier
>for steering based on a rule (e.g. IP addr or port) or policy (hash 
>function).
>
>  
>
>>Is so, what is it and how user can enforce a policy dynamically on given
>>set of rings or ring groups?  I know flowadm can program ring
>>but can it program ring-group?
>>  
>>    
>>
>
>the generalized load balancing policy (generalized from the existing 
>aggr policy) is currently
>the only way to alter the behavior at the level of the ring group, and 
>that's using dladm(1m).
>
>  
>
>>5)
>>Can you group rings of different physical NICs, If yes, what is the 
>>interface
>>for the same?
>>  
>>    
>>
>
>I need to think about this one. The scope of the question is actually 
>how to make the aggr driver
>work efficiently and best utilize the virtualization capabilities of its 
>members.
>Maybe we can have an open crossbow design meeting about this, if members 
>of this audience wish to
>participate.
>
>  
>
I was thinking about some use-case about why we would be needing
such thing.

say, a multicast streaming/video application running in different 
virtual network machines (listening to same multicast address) can have 
stream of multicast packets classified and deposited on rings of a group 
that is made of rings of different physical NICS/VNICs.
a high streaming quality and high thruput could be achieved and all 
multicast
listener apps in all VMs can be single-programmed by programming that
ring-group, if ring-groups are programmable using flowadm.

But this is from top of 100 feets and is on my wish-list :-) and have 
very peculiar requirement and corner usage...

but if its easily doable and already availble in hardware+driver and 
just one
API away in mac framework, I wondered can we have that in crossbow.

Thanks,
Deepti



>Thanks,
>
>    Kais
>  
>
>>-Deepti
>>
>>  
>>    
>>
>
>_______________________________________________
>crossbow-discuss mailing list
>crossbow-discuss at opensolaris.org
>http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss
>  
>


  • [crossbow-disc... deepti dhokte - Sun Microsystems - Menlo Park United States
    • [crossbow... Yunsong (Roamer) Lu
      • [cros... deepti dhokte - Sun Microsystems - Menlo Park United States
        • [... Yunsong (Roamer) Lu
          • ... deepti dhokte - Sun Microsystems - Menlo Park United States
            • ... Yunsong (Roamer) Lu
            • ... deepti dhokte - Sun Microsystems - Menlo Park United States
            • ... Kais Belgaied
            • ... deepti dhokte - Sun Microsystems - Menlo Park United States
    • [crossbow... Kais Belgaied
      • [cros... deepti dhokte - Sun Microsystems - Menlo Park United States
        • [... Kais Belgaied
          • ... deepti dhokte

Reply via email to