This is the design document for porting IPoIB driver to GLDv3 framework. I am looking for folks to review this.
I have done a prototype code based on this design. It works fine. :)

Thanks in advance!
Lizhi
1 Overview
----------
  This case proposes changes to the Solaris kernel to provide support for
  GLDv3-based IPoIB driver. It introduces the two primary components of this
  solution: mac_ib plugin, and GLDv3 IPoIB driver.

  Note that this case only covers all necessary changes for porting IPoIB
  driver to GLDv3 framework. Other enhancement for the IPoIB driver could be 
  done in subsequent projects.

2 The mac_ib plugin
-------------------

  The mac_ib plugin is written to the Nemo MAC-Type Plugin architecture defined
  by PSARC/2006/248.  It will implement the following operations, and be
  installed under /kernel/mac.

    static mactype_ops_t mac_ib_type_ops = {
          MTOPS_PDATA_VERIFY | MTOPS_HEADER_COOK | MTOPS_HEADER_UNCOOK,
          mac_ib_unicst_verify,
          mac_ib_multicst_verify,
          mac_ib_sap_verify,
          mac_ib_header,
          mac_ib_header_info,
          mac_ib_pdata_verify,
          mac_ib_header_cook,
          mac_ib_header_uncook
    };

  A <sys/mac_ib.h> header file will contain the necessary information
  for drivers to use the plugin, namely a MAC_PLUGIN_IDENT_IB macro used
  to identify the plugin during mac_register().

2.1 Multicast/Broadcast address
-------------------------------
  The IPoIB Multicast/Broadcast address is depicted in Figure 1: (RFC4391)

   |  8 |24 bits| 8  | 4 |  4  | 16 bits  | 16 bits |      80 bits      |
   +----+-------+----+---+-----+----------+---------+-------------------+
   |Resv|  QPN  |0xFF|0x1|scope|IPoIB sign|  P_Key  |      group ID     |
   +----+-------+----+---+-----+----------+---------+-------------------+

                        Figure 1
  
  Since <scope> and <P_Key> have different values between two driver instances,
  the mac_ib plugin has to set them to zero. All other fields are filled with
  exact value in the mac_ib plugin. In IPoIB driver mc_tx() and mc_multicst()
  callback functions, <scope> and <P_Key> will be filled with correct value if
  the QPN of the destination address is Multicast/Broadcast QPN (0xFFFFFF).
 
2.2 mac_ib_header
-----------------
  All IP and ARP datagrams transported over InfiniBand are prefixed by
   a 4-octet encapsulation header as illustrated below. (see RFC4391)

   | 16 bits  | 16 bits |
   +----------+---------+
   |  type    |  Resv   |
   +----------+---------+

  However, in order to transmit the datagram to correct destination, an extra
  header including destination address is required. This header struct is
  defined in <sys/mac_ib.> as illustrated below.

      typedef struct ib_addrs {
          ipoib_mac_t   ipib_src;
          ipoib_mac_t   ipib_dst;
      } ib_addrs_t;

      typedef struct ib_header {
          union {
               ipoib_pgrh_t     ipib_grh;
               ib_addrs_t       ipib_addrs;
          } ipib_prefix;
          ipoib_hdr_t   ipib_rhdr;
      } ib_header_t;

  This extra header will be this format below:

   | 20 bytes | 20 bytes|
   +----------+---------+
   | ipib_src | ipib_dst|
   +----------+---------+ 

      Extra Header

  For outbound datagram, mac_ib_header() will create a header and fill in
  destination address.
  The inbound datagram has a GRH prefix which has the same size with the Extra
  Header above . The IPoIB driver rx function will get the information from GRH
  and reconstruct it to Extra Header.

  Note: The GRH header is not used as the Extra Header, because it does not
        contain destination address.

2.3 mac_ib_header_cook/mac_ib_header_uncook
-------------------------------------------
  In IPoIB design PSARC/2001/289, GLDv2 is supposed to send this down to driver

       |  20 bytes      |4 bytes|               |
       +----------------+-------+---------------+
       | destination    | Type  |  IP/ARP data  |
       +----------------+-------+---------------+

                  Format A

  And the driver is supposed to hand over this to GLDv2.

       |  40 bytes   |4 bytes|               |
       +-------------+-------+---------------+
       |     GRH     | Type  |  IP/ARP data  |
       +-------------+-------+---------------+

                  Format B

  GRH will be striped off in GLDv2. So for raw dlpi client, the outbound packet
  is as "Format A" and the inbound packet is "4 bytes Type + IP/ARP data".

  After porting to GLDv3, the driver has to be compatible with raw dlpi client.
  mac_ib_header_cook() will strip off 20 bytes destination address and create
  new Extra Header (see 2.2).
  mac_ib_header_uncook() will strip off the Extra Header.

3 GLDv3 IPoIB driver
--------------------
  Basicly, GLDv3 IPoIB driver is ported from GLD version. Most of the features
  of GLD verision driver will be inherited.
  The remainder of this section discusses the important changes in GLDv3 IPoIB
  driver.

3.1 Add/Remove multicast address
--------------------------------
  For IPoIB driver, adding or removing a multicast address has to be an
  asynchronization operation. It has to negotiate with InfiniBand SA 
  (Subnet Administration) to join/unjoin a multicast group. However, GLDv3 does
  not support async operation in m_mulitcst entry point currently.
  In this scenario, the IPoIB driver is proposed to return zero(success)
  immediately   if the request can be scheduled to be sent. The IPoIB driver
  will wait for the reply in an async thread. If the SA reports that the
  operation fails in its reply, IPoIB will generate a message in the message
  log. This is reasonable because this sort of failure indicates a fabric
  problem and needs to be reported to the fabric administrator, not the host
  applications. There are no recovery operations that can be done by making
  changes to Solaris.

  Note that it is the same for Set/Unset promiscuous mode.

3.2 Service FIFO mechanism
--------------------------
  In GLD version IPoIB driver, it introduced service FIFO mechanism. In the
  interrupt handler, it does not call gld_recv() directly for machine armed
  with multiple CPUs. Instead, it will send the received packet to a service
  fifo. A work thread will get this packet and call gld_recv() later.
  This mechanism will disabled by default in GLDv3 driver, since GLDv3 is
  supposed to do the similar thing via soft ring PSARC/2005/654.

4. Interfaces
-------------

______________________________________________________________________________
|                             Interfaces Added                               | 
|_________________________|_______________________|__________________________|
|      mac_ib.h           | Consolidation Private | <sys/mac_ib.h>           |
|_________________________|_______________________|__________________________|

_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to