Hi Nicolas Sorry for the late response. Please see some more questions/comments below ..
<..snip..> > >> >> Q1.2) On pg 38, there is a reference to the following flags, >> but which >> interface takes them as an argument? >> >> MAC_OPEN_FLAGS_FORCE_MULTI_RINGS >> MAC_OPEN_FLAGS_FORCE_ONE_RING >> >> It seems like these are an argument to mac_client_open(), >> but there is a reference mac_open() in the description see below: >> >> "If MAC_OPEN_FLAGS_FORCE_MULTI_RINGS flag is set and it is not >> possible to allocate mbc_ncpus hardware rings, the mac_open() >> call will fail, otherwise the MAC layer will attempt to reserve >> one hardware ring for the MAC client." > > These flags are specified when calling mac_client_open(), not > mac_open(). I guess the above text will get fixed in a subsequent revision. >> >> Q1.3) Are there any other flags other than the following ones? >> >> MAC_OPEN_FLAGS_FORCE_MULTI_RINGS >> MAC_OPEN_FLAGS_FORCE_ONE_RING > > No. Is there a reason this is tied to hardware rings. We would like the mac client open request be extended so that it can get either all software rings, a mix of hardware and software or HW rings in order to match the number of cpus specified in the client_open call .. A flag can be specified for this .. Also according to the explanation in the doc at page 38, there is also a case where no flags is specified. It seems like, if no flags specified, then it will attempt to reserve one hardware ring. It seems to not to fail even if such reservation fails, but it is not clearly specified. >> - Is there a way to force a software ring? > > Do you mean not assign a hardware ring? I think this is something > we could add, yes. This is related to the above. Can you add a flag that we can use to indicate that client wants to use a single ring or multiple rings but not force hardware rings. That way even when the underlying device does not have enough hardware rings, a client can get a soft ring per CPU. Above comment applies to this one also. The behavior without any flags seem to attempt to reserve one h/w ring. What is the failure case ? >> >> Q1.4) Is the mbc_cpus in mac_bind_cpus_t an array of CPU ids? > > Yes. > >> >> Q1.5) The following description of mbc_cpus on pg 37 is not >> clear, >> especially for the non-NULL case. >> >> "If mbc_cpus is NULL, the MAC layer will pick the CPUs. >> If mbc_cpus is non-NULL, the MAC layer will chose the CPUs.". > > The first one is correct. If mbc_cpus is non-NULL, the MAC layer > will assign the CPUs provided by the caller. When the mbc_cpus is NULL what determines how many CPUs and hence the number of rings available to this client. > >> >> Q1.6) What is the relationship between Unicast addresses(multiple >> unicast set via mac_unicast_add()), Rings and CPUs? >> >> - Is there a 1:1 relation between a unicast address and a ring? >> - Is there a 1:1 relation between a ring and CPU? > > Neither. The MAC addresses will share the same rings and CPUs. But since you are allowing multiple mac addresses to be associated with a client, can we add support as part of the unicast_add call to indicate that each of these addresses should be associated with ring (either HW or SW). > >> >> - The Rings and CPUs are tightly coupled in this interface. >> How can allocate multiple rings even when there is one CPU(or >> less >> number of CPUs). > > You don't allocate rings explicitly, you express a level of > parallelism instead, the framework distributes the hardware rings > transparently. But the only way we can control this parallelism is by specifying the number of CPUs in the domain. But in a system capable of adding and removing CPUs dynamically, we might want to change the parallelism level too. The current APIs dont allow changing this. We will need a way to specify this as an extension to the client_open or a via a new API call. Also the document states that if mbc_ncpus HW rings the open will fail. As I mentioned earlier it would be nice if we can get software rings in this case. Also, in terms of parallelism is this specified by the no. of CPUs or by unique CPUIDs in the array. What happens if I specify ncpus where all IDs are the same - do I get ncpus HW rings if they are available. Also can we then change the ring to cpu mapping when more CPUs are added/removed to/from the domain ? > >> - When there are multiple CPUs and multiple unicast addresses, >> is there address fanout per CPU? > > See 2 answers above. This will be a very useful feature as it will allow clients to associate each ring with a mac address. Currently the only way to do this is to do separate mac_client_open calls associate it with a ring and then bind it to a mac address. > >> >> Q1.7) How is the binding of CPUs via mac_bind_cpus_t is co- >> ordinated >> with CPU DR(on the platforms that support them)? > > The MAC layer will be notified of the removal of the CPU and will > stop using it for its worker threads and interrupts. That is purely error handling. We need the ability to be able to use more CPUs and improve the level of parallelism when CPUs are added. The reverse is true when the CPUs are removed. When the MAC layer is notified about CPUs going away does it remove the rings associated with the CPUs ? > >> >> NOTE: CPU DR is already a supported feature on LDoms. >> >> Q1.8) LDoms requires the CPU binding to be changed dynamically, >> how can this be accomplished ? > > This cannot be done with the API as documented today. It seems that > you are looking for a call to change the set of CPUs assigned to > the MAC client, is that what you are asking for? See 1.7 <..snip..> >> >> Q1.10) Can the mac client interface be extended to support >> creating >> a client based on ether_type? This is required for mac >> clients >> like fiberchannel over ethernet. > > No, each MAC client corresponds to a MAC level entity which is > defined by its MAC address. Multiple ether types can be supported > on top of a MAC client. Devices like the Niagara2 NIU allow classification of packets using parameters like the ether_type. How can a mac_client take advantage of such a functionality. <..snip..> >> Q2.1) The section 4.5 describes "By value" type which is used >> to set a specific MAC address by the MAC client. But there >> is no equivalent addr_type definition under mac_unicast_add() >> interface. > > MAC_UNICAST_VALUE is missing from the list, this is what you are > looking for. I presume this will be documented in the next revision. >> >> NOTE: LDoms requires the MAC addresses that are allocated >> by LDom manager be used by the network device. So, LDoms >> will not use any other addr_type other than "By value" type. > > That's fine. > >> >> Q2.2) Is there an impact to the multiaddress_capab_t.maddr_add()/ >> maddr_remove() interfaces? Are these being obsoleted or >> going away? > > The capability will stay, and the framework will continue to use > that capability to query and control the allocation of MAC address > slots. However that interface is not intended to be used by drivers > which should use the MAC client interfaces instead. OK. >> >> Q2.3) A system with many domains (aka LDoms) with virtual network >> devices, it requires the use of a large number >> layer2 addresses, >> this will exhaust h/w slots available on most standard NICs. >> How can a client take advantage of layer2 filtering >> provided by >> NICs like NII-NIU/Neptune. Specifically, this will help in >> avoiding the programming of the device into >> PROMISCous mode >> etc. Currently there are no interfaces that seem to >> provide >> such ability. > > Yes, this is a situation we are aware of. We've talked on this list > about having multiple VNICs sharing the same MAC address, and > identified by their IP address instead. However this needs to be > scoped and defined further before we can commit on providing that > functionality. > The current APIs only allow adding as many addresses as the number of slots available. Following this it will put the adapter in promisc mode. Instead can you add the capability to specify when to use a filter and when to take up a slot in the HW. >> >> Q2.4) Clients will need the ability to specify if mac_unicast_add() >> is allowed it to go into promiscous mode or not. An >> error return >> value is required if no h/w mac address slot is available. > > OK, I will add a flag. Thanks .. <..snip..> >> >> Q2.6) Can it be assumed that every address added to a client is >> processed in a separate ring (either h/w ring or s/w >> ring)? > > No, all the MAC addresses for a client will share the same ring(s). > If there's a need to have a different set of rings associated with > a MAC address, then a different MAC client should be created. What happens when a single client has multiple rings and multiple mac addresses. How is the mapping done in that case ? Would it be possible to in that case request a 1-to-1 mapping and reserve a ring for each address ? >> Q2.7) How are the multiple addresses per client maintained, is it >> done >> in the MAC layer or does it bybpass the MAC layer and passed >> to h/w directly. > > Since the action of reserving the MAC address is triggered by a > call to the MAC layer, the MAC layer cannot be bypassed. The MAC > layer will use the multiple MAC address capability exposed by the > driver to reserve a new MAC address slot. What if the driver does not expose that capability. Will the unicast_add call fail ? Is the MAC layer essentially reflecting the capability of the underlying hardware or does it provide the ability to have multiple addresses irrespective of whether the HW has multiple slots or not ? >> >> Q2.8) Can unlimited number of mac addresses be assigned to a MAC >> client? What are the software/hardware features that limit >> this? > > Memory that can be allocated by the kernel. So even if the underlying device runs out of slots the MAC layer will maintain all the addresses associated with that client. How does it then manage and associate these addresses with the rings allocated for this client ? What does it do in both software and hardware to filter the addresses for this client ? Also which addresses get HW slots and which dont ? Also if you run out of slots does the HW go to promisc mode .. >> >> 3) Rings related: >> (Crossbow-virt.pdf Section 5.3 Pg 43) >> mac_rint_t *mac_rings_list_get(mac_client_handle_t mch, >> uint_t nrings); >> void mac_rings_list_free(mac_rings_t *rings, uint_t nrings); >> uint16_t mac_ring_get_flags(mac_ring_t ring); >> >> >> QUESTIONS: >> >> Q3.1) All of these interfaces are now categorized as project- >> private >> API. What motivated this change. These interfaces need to be >> more open. > > The MAC layer will do the allocation of hardware resources to the > various MAC clients and their flows. Instead of having each MAC > client manage its own set of resources, the resources are allocated > to MAC clients based on their needs, for example the degree of > parallelism expressed through mac_client_open(). If you have > specific functional requirements that are not satisfied by the > current document, please list them. Currently rings are hidden resources entirely managed by the mac layer and clients have no visiblity. All the client gets to do is request a degree of parallelism. Providing APIs that allow clients to see how rings were allocated will be useful. > >> Q3.2) The mac_rings_list_get() is only for h/w rings, is >> there >> an equivalent interface to obtain s/w ring information. >> Or this interface can be extended return both h/w ring >> or s/w ring information. > > The interface will evolve to provide that information, but it will > remain project private. It is provided here FYI but will change in > future revisions of the document. So the expectation is that ring APIs should not be used by clients and it is only an internal MAC layer resource managed by it ? > >> Q3.3) Are the mac_resource_set() and mac_resources() interfaces >> going away? > > Yes, they will be replaced by different interfaces. But note that > they are already project private in Nevada and were not supposed to > be used by other ON components. Agreed, but there is no other way in the new Crossbow API a way to take advantage of multiple rings. There is one generic RX callback, but no other way to associate a callback with a specific ring. This is a limitation of the existing API and support should be added to open API list, so that we can process traffic independently. So having some additional APIs to expose this will be very useful. >> Q3.4) What is the action taken when no free h/w ring available. >> As per the documentation of mac_rings_list_get(), if no h/w >> ring available, it returns NULL. In such case, how does >> mac_unicast_add() behave when NULL is passed for rings? > > mac_unicast_add() no longer takes rings. This will be handled > transparently to the MAC clients by using a default ring and > falling back to software classification. So when there are multiple rings -- and these rings are associated with all the addresses - there is no pre-defined mapping ? That can be inefficient as the hardware has the ability to associate each ring with an address. Can we extend the MAC apis to allow a client to choose between the default behavior and binding addresses to rings ? >> Q3.5) Are there any interfaces other than the above mac_rings_xxx >> interfaces that are available to deal with MAC rings? > > Not available to MAC clients. The set of project private interfaces > might evolve as we refine the design. I would like to see some of the functionality exported via these private mac interfaces promoted to a open API. Even if the API cannot be moved over, can we extend APIs to provide hints .. > >> Q3.6) Is the mac_rings_list_get() returns the list of mac rings >> assigned to the client at the time of client open. How can >> this be changed after the client is open. > > The set of assigned rings may change. The details on the APIs > needed to support this still need to be defined, but they will > remain project private. So you are saying there is no way to rely on how many rings are available to a particular client. This will change without the client's control ? Is CPUs being removed from the system a case under which this will happen ? >> Q3.7) Assigning h/w rings to a specific MAC address limits the >> bandwidth to the number of rings that are assigned to that >> address. Is there a way to not to bind h/w rings specific >> to MAC address so that the bandwidth could be used by >> any mac client depending on the traffic? > > See Q1.3. Not sure what you mean. Are you suggesting that some mac addresses will have SW rings and others will be associated to HW rings ? >> >> 4) Receive callback related: >> (Crossbow-virt.pdf Section 5.2.5 Pg 40) >> int mac_rx_set(mac_client_handle_t mch, mac_rx_fn_t rx_fn, >> void *arg); >> int mac_rx_clear(mac_client_handle_t mch); >> >> QUESTIONS: >> >> Q4.1) How can a client get rx callback per ring that is >> assigned >> to the mac client? This will allow parallel processing >> and improve the performance. Such a feature is already >> being used in the current implementation of LDoms vSwitch >> driver and the mac_xxx interfaces should support such an >> ability. > > The parallel processing will still happen. I.e. if multiple > hardware rings or software rings are assigned to a MAC clients, > multiple connections associated with that MAC client will be spread > across these rings. So with multiple rings, there will be concurrent callbacks to the rx_fn, each with packets in the corresponding ring ? Also will each callback be able to determine what ring did the callback ? > >> Q4.2) How can a client get a separate callback for a defined type of >> traffic, such as different SAP numbers etc. This will >> be useful to provide out of the band type packet processing >> or related services. > > This will be supported by a MAC flow API built on top of the MAC > client API. The flow API will be described by a separate document. So if a client wants to use the flow API will it need to layer itself on the flow API and not the mac client API directly. Can you give me more information on what this layering will look like. Also, when do you expect the flow API doc to be available ? <..snip..> >> >> 5) Transmit related: >> >> (Crossbow-virt.pdf Section 5.2.7 Pg 41) >> mblk_t *mac_tx(mac_client_handle_t *mch, mblk_t *mp, uint64_t hint); >> >> QUESTIONS: >> >> Q5.1) What are the valid values for the 'hint' argument? >> From the description on pg 42, NULL seems to be >> a valid value. Is it safe to assume that the 'hint' is a >> ring-id, if so, a NULL value of 0 will conflict with a >> ring-id of 0. > > The hint can be any 64 bit value, but it must always be the same > value for the packets corresponding to the same connection to avoid > reordering. TCP and UDP for example pass the connection pointer as > the hint, which allows us to avoid packet inspection for these > protocols. Can you clarify what the hint is being used for. Is it similar to the case below -- where a hash will be applied on hint value to pick up a TX ring ? >> >> Q5.2) If NULL specified as a 'hint', how is the tx ring >> selected? > > In this case mac_tx() will parse the packet headers and hash on the > header information to select a transmit ring. > Is the goal here to somehow bifurcate traffic being sent by a client via the interface ? The algorithm is pre-determined by the mac layer and either hint or misc headed + hash will be used to determine the Tx ring for transmit ? It is possible for a client to request a specific ring -- is the only way to do this is pick a unique hint ? >> >> Q5.3) The 'hint' argument description says the following. >> What is the meaning of a connection in this context and >> how to identify this? >> >> "The hint must be the same for packets of the same >> connection." > > It can be a TCP connection for example. This is required to avoid > reordering of packets for the same connection. OK .. > >> 6) Multicast addresses related: >> (Crossbow-virt.pdf Section 5.2.6 Pg 41) >> int mac_multicast_add(mac_client_handle_t mch, const uint8_t >> *addr); >> int mac_multicast_remove(mac_client_handle_t mch, const uint8_t >> *addr); >> >> >> No comments at this point. >> >> 7) Promiscous mode realted: >> >> (Crossbow-virt.pdf Section 5.2.8 Pg 42) >> Its not clear if the above interface will be available or not, >> but two new intefaces are added: >> >> int mac_promisc_add(mac_client_handle_t mch, mac_promisc_type >> promisc_type, mac_promisc_fn_t promisc_fn, void *arg, >> mac_promisc_handle_t *php); >> int mac_promisc_remove(mac_client_handle_t mch, >> mac_promisc_handle_t *ph); >> >> MAC_PROMISC_ALL - send all packets >> MAC_PROMISC_MULTI - only broadcast and multicast >> >> May be the mac_promisc_add(MAC_PROMISC_ALL) will force device >> to operate in the promiscous mode. > > Both need to, since the device needs to be in promiscuous mode also > to receive all multicast traffic. What do mean by "Both need to." ? In addition to above interface to enable promisc mode, will the existing promisc_set() interface be removed ? Also, PROMISC_ALL is unicast+multicast, whereas MULTI is only all multicast traffic ? > >> >> QUESTIONS: >> >> Q7.1) According to the section 4.6, the promiscuous mode >> operates >> in the layer2 switch model. When choosing the promiscuous mode >> model can it be either layer2 switch model or shared >> ethernet model? > Comments ? >> Q7.2) From the explanation of mac_promisc_add(), it seems like >> the mac_promisc_add() could be called without setting >> MAC address via mac_unicast_add(). Is this correct? >> If so, what is the expected behaviour? > > Currently we provide the same semantics as a switched environment, > i.e. a MAC client will see the same traffic that would be seen by a > NIC connected to a switch. > Is there a way to see only the multicast traffic associated with all mac clients - union of all mac_client multicast_add addresses. The MULTI promisc option seems more a way to weed to unicast and broadcast traffic on the wire and pass all wire multicast traffic up - including ones the system may not be interested in ? Is this the case ? > What we would also like to provide is the ability to for a MAC > client to obtain all the traffic going in and out of the box, as > well as the traffic exchanged between MAC clients. The non-unicast > address was part of that solution. > OK .. > Another option would be to generalize this with the shared ethernet > model, and allow a MAC client to specify that it wants to observe > all traffic via a separate promiscuous type. I need to see how this > can be added to the API. This will be very useful. How about something MAC_PROMISC_CLIENTS ? > >> >> 8) Statistics related: >> >> Q8.1) Is the mac_stat_get() interface being obsoleted or >> changed? >> If so, what is the new equivalent interface? > > Yes, there will be a new MAC client interface. The MAC layer will > also maintain per-MAC client statistics for MAC client specific > statistics such as number of packets sent/received, etc. I need to > add that interface to the document. OK -- when will this be available. Next doc update ? Until that point in time should we continue to use the mac_stat_get() interface ? >> GENERAL QUESTONS: >> ================ >> >> Qg.1) Are there any GLDv3 MAC client interfaces that are being >> obsoleted(provided by the Nemo framework) but not documented >> in this doc? > > The MAC client interface was project private, and most of the > interface is being completely revamped by Crossbow. The set of MAC > client API available to ON consolidation components is described by > section 5.2 of the document. Any other MAC client API are still > project private. So all interfaces that were part of GLDv3 will be replaced with the interfaces specified here. Anything that is not specified here should not be used moving forward ? >> Qg.2) Are there any changes to the MAC driver interfaces or being >> obsoleted? > > The changes made to the driver API will be published as part of a > separate forthcoming document. How soon will this doc become available ? I see in the latest doc Kais published there are some new interfaces for ring support. I presume there will be a separate doc for the deneric mac driver interfaces ? >> >> Qg.3) There are no MAC client interfaces to specify bandwidth >> attributes. From the section 4.7, it seems like they are >> implemented as part of VNIC and not as MAC client interfaces. >> If this is the case, how can the bandwidth attributes be >> specified? > > They are not documented yet, but will be specified as arguments to > mac_client_open(). Can we expect to see this in the next revision of this document ? > >> >> Qg.4) When will the classification interface be fully documented >> for review? > > There will be separate documents for the MAC driver classification > interfaces, and for the MAC client flow APIs. When will this be available ? Thanks -Narayan