Hi Sebastien,

Thanks a lot for your input.

Sebastien Roy wrote:
Lizhi Hou wrote:
This is the design document for porting IPoIB driver to GLDv3 framework. I am looking for folks to review this.
I have done a prototype code based on this design. It works fine. :)

Excellent stuff.  I have a couple of minor comments:

* Can you mention the name of the IPoIB driver in the Overview and in section 3?

Sure. I will add the driver name 'ibd' to section 3.
* Can you add a subsection to section 2 describing mac_ib_sap_verify(), and what the SAP space looks like for IB?

2.5 mac_ib_sap_verify
---------------------
 mac_ib_sap_verify() Check the legality of an SAP value. Based on
 PSARC/2003/150, the SAP range 0-255 selects IEEE 802 semantics, so
 mac_ib_sap_verify() returns B_TRUE and sets bind_sap (if non_NULL) to LLC
 SAP to which GLDv3 should bind DLPI consumers. The SAP range 256-65535
 selects EtherType semantics. mac_ib_sap_verify() returns B_TRUE and sets
bind_sap to the SAP value. For other SAP values, mac_ib_sap_verify() returns
 B_FALSE.
* The plugin defines a mac_ib_pdata_verify() function, but does not define the expected format of plugin data. Drivers must know that information, as they pass in plugin data in mac_register_t or as part of mac_pdata_update().

The pdata is useless. I removed the pdata and mac_ib_pdata_verify().
* In section 2.1, you say that the mac_ib plugin sets parts of the multicast or broadcast address to certain values, but you don't mention in what context (as part of what operation?).

When the mac_ib plugin module loads, the broadcast address is registered with the GLDv3 framework by calling mactype_register().
* Please add the ident for the mac_ib plugin to the interface table (MAC_PLUGIN_IDENT_IB).

Done.
-Seb

The updated document is attached.

Thanks,
Lizhi
1. Introduction
    1.1. Project/Component Working Name:
         IPoIB conversion to GLDv3
    1.2. Name of Document Author/Supplier:
         Author:  Lizhi Hou
    1.3  Date of This Document:
         24 Oct, 2007

Technical Description:

1 Overview
----------
  This case proposes changes to the Solaris kernel to provide support for
  GLDv3-based IPoIB driver ibd(7d). It introduces the two primary components of
  this solution: mac_ib plugin, and GLDv3 IPoIB driver.

  Note that this case only covers all necessary changes for porting IPoIB
  driver to GLDv3 framework. Additional enhancements made to the IPoIB driver
  will be done. The whole IPoIB architecture is defined by PSARC/2001/289.

2 The mac_ib plugin
-------------------

  The mac_ib plugin is written to the Nemo MAC-Type Plugin architecture defined
  by Nemo design document, revision 9. The plug-in will fill in all mtr_ops
  callbacks with functions appropriate for IB as below.

    static mactype_ops_t mac_ib_type_ops = {
          MTOPS_HEADER_COOK | MTOPS_HEADER_UNCOOK | MTOPS_LINK_DETAILS
          mac_ib_unicst_verify,
          mac_ib_multicst_verify,
          mac_ib_sap_verify,
          mac_ib_header,
          mac_ib_header_info,
          NULL,                     /* pdata verify */
          mac_ib_header_cook,
          mac_ib_header_uncook,
          mac_ib_link_details
    };

  A <sys/mac_ib.h> header file will contain the necessary information
  for drivers to use the plugin, namely a MAC_PLUGIN_IDENT_IB macro used
  to identify the plugin during mac_register().

  Note that Nemo design document can be get from OpenSolaris website.

2.1 Multicast/Broadcast address
-------------------------------
  The current MAC plug-in design makes an assumption that there is single
  broadcast address defined for the interconnect (like on ethernet). However
  IPoIB defines a broadcast address per IPoIB link (See RFC4391).

  The IPoIB Multicast/Broadcast address is depicted in Figure 1:
  ( see definition in RFC 4391, section 4 )

   |  8 |24 bits| 8  | 4 |  4  | 16 bits  | 16 bits |      80 bits      |
   +----+-------+----+---+-----+----------+---------+-------------------+
   |Resv|  QPN  |0xFF|0x1|scope|IPoIB sign|  P_Key  |      group ID     |
   +----+-------+----+---+-----+----------+---------+-------------------+

                        Figure 1
  
  Since <scope> and <P_Key> have different values between two driver instances,
  the mac_ib plugin has to set them to zero. All other fields are filled with
  exact value in the mac_ib plugin. When the mac_ib plugin module loads, this
  broadcast address is registered with the GLDv3 framework by calling
  mactype_register(). In IPoIB driver mc_tx() and mc_multicst() callback
  functions, <scope> and <P_Key> will be filled with correct value if the QPN
  of the destination address is Multicast/Broadcast QPN (0xFFFFFF).

  Since mc_multicst() will fill <scope> and <P_Key>, no changes are necessary
  for the multicast related IB code in if_ip.c.
 
2.2 mac_ib_header
-----------------
  All IP and ARP datagrams transported over InfiniBand are prefixed by
   a 4-octet encapsulation header as illustrated below. (see RFC4391)

   | 16 bits  | 16 bits |
   +----------+---------+
   |  type    |  Resv   |
   +----------+---------+

  However, in order to transmit the datagram to correct destination, an extra
  header including destination address is required. IB does not provide an
  interface for sending a link layer header directly to the IB link and the
  link layer header received from the IB link is missing information that GLDv3
  requires. So mac_ib plugin will specify a "soft" header in <sys/mac_ib.h> as
  illustrated below.

      typedef struct ib_addrs {
          ipoib_mac_t   ipib_src;
          ipoib_mac_t   ipib_dst;
      } ib_addrs_t;

      typedef struct ib_header_info {
          union {
               ipoib_pgrh_t     ipib_grh;
               ib_addrs_t       ipib_addrs;
          } ipib_prefix;
          ipoib_hdr_t   ipib_rhdr;
      } ib_header_info_t;

  This extra header will be this format below:

   | 20 bytes | 20 bytes|
   +----------+---------+
   | ipib_src | ipib_dst|
   +----------+---------+ 

   Header_info structure

  For outbound datagram, mac_ib_header() will create the Header_info structure
  and fill in destination address.
  For inbound datagrams, the IB link will deliver one of the IB link layer
  headers called, the Global Routing Header (GRH) and information from it is
  used by the IPoIB driver to build the Header_info structure and pass it with
  the datagram up to GLDv3.

2.3 mac_ib_header_cook/mac_ib_header_uncook
-------------------------------------------
  In IPoIB design PSARC/2001/289, GLDv2 is supposed to send this down to driver

       |  20 bytes      |4 bytes|               |
       +----------------+-------+---------------+
       | destination    | Type  |  IP/ARP data  |
       +----------------+-------+---------------+

                  Format A

  And the driver is supposed to hand over this to GLDv2.

       |  40 bytes   |4 bytes|               |
       +-------------+-------+---------------+
       |     GRH     | Type  |  IP/ARP data  |
       +-------------+-------+---------------+

                  Format B

  After porting to GLDv3, the driver has to be compatible with raw dlpi client.
  mac_ib_header_cook() will strip off 20 bytes destination address and create
  new header_info structure (see 2.2).
  mac_ib_header_uncook() will strip off the Extra Header.

2.4 mac_ib_link_details
-----------------------
  When the link is active, mac_ib_link_details() will be called to provide
  details on link speed.

2.5 mac_ib_sap_verify
---------------------
  mac_ib_sap_verify() Check the legality of an SAP value. Based on
  PSARC/2003/150, the SAP range 0-255 selects IEEE 802 semantics, so
  mac_ib_sap_verify() returns B_TRUE and sets bind_sap (if non_NULL) to LLC
  SAP to which GLDv3 should bind DLPI consumers. The SAP range 256-65535
  selects EtherType semantics. mac_ib_sap_verify() returns B_TRUE and sets
  bind_sap to the SAP value. For other SAP values, mac_ib_sap_verify() returns
  B_FALSE. 

3 GLDv3 IPoIB driver
--------------------
  Basicly, GLDv3 IPoIB driver ibd(7d) is ported from GLD version. Most of the
  features of GLD verision driver will be inherited.
  The remainder of this section discusses the important changes in GLDv3 IPoIB
  driver.

3.1 Add/Remove multicast address
--------------------------------
  GLDv3 architecture assumes that add/remove multicast addresses and set/unset
  promiscuous mode are done by manipulating data structures managed by the NIC
  interface.  This is also true for IPoIB, however, it is also necessary to
  communicate with an IB fabric entity called the SA to make corresponding
  changes in the IB switches in the IB fabric. The communication with the SA is
  handled asynchronously by the IPoIB driver. In this scenario, the IPoIB
  driver is proposed to return zero(success) immediately if the request can be
  scheduled to be sent and wait for the reply in an async thread. If the SA
      (1) fails to respond or
      (2) can't satisfy the request,
  then an error is logged.
  This is reasonable because this sort of failure indicates a fabric
  problem and needs to be reported to the fabric administrator, not the host
  applications. There are no recovery operations that can be done by making
  changes to Solaris.

3.2 Service FIFO mechanism
--------------------------
  In GLD version IPoIB driver, it introduced service FIFO mechanism. In the
  interrupt handler, it does not call gld_recv() directly for machine armed
  with multiple CPUs. Instead, it will send the received packet to a service
  fifo. A work thread will get this packet and call gld_recv() later.
  This mechanism will disabled by default in GLDv3 driver, since GLDv3 is
  supposed to do the similar thing via soft ring PSARC/2005/654.

4. Interfaces
-------------

______________________________________________________________________________
|                             Interfaces Added                               | 
|_________________________|_______________________|__________________________|
|      mac_ib.h           | Consolidation Private | <sys/mac_ib.h>           |
|  MAC_PLUGIN_IDENT_IB    | Consolidation Private | <sys/mac_ib.h>           |
|_________________________|_______________________|__________________________|

5. References
-------------
  http://opensolaris.org/os/community/networking/nemo-design.pdf
  ftp://ftp.rfc-editor.org/in-notes/rfc4391.txt
_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to