2. Architecture ----------------
This is a higher level approach to the problem, but I came up with the
following QoS relationship hierarchy, where '->' means 'maps to'.
Application Service -> Service ID (or range)
Service ID -> desired QoS
QoS, SGID, DGID, PKey -> SGID, DGID, TClass, FlowLabel, PKey
SGID, DGID, TC, FL, PKey -> SLID, DLID, SL (set if crossing subnets)
SLID, DLID, SL -> MTU, Rate, VL, PacketLifeTime
I use these relationships below:
4. IPoIB ---------
IPoIB already query the SA for its broadcast group information. The
additional functionality required is for IPoIB to provide the
broadcast group SL, MTU, and RATE in every following PathRecord query
performed when a new UDAV is needed by IPoIB. We could assign a
special Service-ID for IPoIB use but since all communication on the
same IPoIB interface shares the same QoS-Level without the ability to
differentiate it by target service we can ignore it for simplicity.
Rather than IPoIB specifying SL, MTU, and rate with PR queries, it
should specify TClass and FlowLabel. This is necessary for IPoIB to
span IB subnets.
5. CMA features ----------------
The CMA interface supports Service-ID through the notion of port
space as a prefixes to the port_num which is part of the sockaddr
provided to rdma_resolve_add(). What is missing is the explicit
request for a QoS-Class that should allow the ULP (like SDP) to
propagate a specific request for a class of service. A mechanism for
providing the QoS-Class is available in the IPv6 address, so we could
use that address field. Another option is to implement a special
connection options API for CMA.
Missing functionality by CMA is the usage of the provided QoS-Class
and Service-ID in the sent PR/MPR. When a response is obtained it is
an existing requirement for the CMA to use the PR/MPR from the
response in setting up the QP address vector.
I think the RDMA CM needs two solutions, depending on which address
family is used. For IPv6, the existing interface is sufficient, and
works for both IB and iWarp. The RDMA CM only needs to include the TC
and FL as part of its PR query. For IPv4, to remain transport
neutral, I think we should add an rdma_set_option() routine to specify
the QoS field. The RDMA CM would include the QoS field for PR query
under this condition.
For IB, this requires changes to the ib_sa to support the new PR
extensions. I don't think we gain anything having the RDMA CM include
service IDs as part of the query.
6. SDP -------
SDP uses CMA for building its connections. The Service-ID for SDP is
0x000000000001PPPP, where PPPP are 4 hex digits holding the remote
TCP/IP Port Number to connect to. SDP might be provided with
SO_PRIORITY socket option. In that case the value provided should be
sent to the CMA as the TClass option of that connection.
SDP would use specify the QoS through the IPv6 address or
rdma_set_option() routine.
7. SRP -------
Current SRP implementation uses its own CM callbacks (not CMA). So
SRP should fill in the Service-ID in the PR/MPR by itself and use
that information in setting up the QP. The T10 SRP standard defines
the SRP Service-ID to be defined by the SRP target I/O Controller
(but they should also comply with IBTA Service- ID rules). Anyway,
the Service-ID is reported by the I/O Controller in the
ServiceEntries DMA attribute and should be used in the PR/MPR if the
SA reports its ability to handle QoS PR/MPRs.
I agree.
8. iSER -------- iSER uses CMA and thus should be very close to SDP.
The Service-ID for iSER should be TBD.
See RDMA CM and SDP.
3.2. PR/MPR query handling: OpenSM should be able to enforce the
provided policy on client request. The overall flow for such requests
is: first the request is matched against the defined match rules such
that the target QoS-Level definition is found. Given the QoS-Level a
path(s) search is performed with the given restrictions imposed by
that level. The following two sections describe these steps.
If we use the QoS hierarchy outlined above, I think we can construct
some fairly simple tables to guide our PR selection. The SA may need
to construct the tables starting at the bottom and working up, but I
*think* it could be done. And by distributing the tables, we can
support a more distributed (a la local SA) operation.
From an administration point, I would be happier seeing something
where the administrator defines a QoS level in terms of latency or
bandwidth requirements and relative priority. Then, if desired, the
administrator could provide more details, such as indicating which
nodes would use which services, minimum required MTUs, etc. It would
then be up to the SA to map these requirements to specific TC, FL, SL,
VL values.
In general, though, I'm personally far less concerned with the QoS
specification interface to the SA, versus the operation that takes
place on the hosts.
Comments on using this approach on the host side?