Doug,
You have given me a lot to think about... Comments below...
> > > >
> > > > While it is a different type of technology, standard verbs[*]
> > > > remains 100%
> > > compatible. Unlike other verbs technologies user space software
> > > does not need any knowledge that the underlying device is not IB.
> > > For example, PR (and SA) queries, CM, rdmacm, and verbs calls themselves
> are all 100% IB compatible.
> > >
> > > Even if OPA is 100% standard verbs compatible which it does not
> > > appear to be, that does not make OPA an extra capability of an IBA device.
> >
> > I don't want to make it an extra capability of an IBA device. I want to
> > make it
> an extra capability of a "verbs" device.
>
> And this, friends, is why it's bad to make both a link layer and an user
> space API
> with the exact same name ;-). Anyway, I get your point Ira and it makes sense
> to me. However, I also get Hal's point. Our track record on this particular
> issue is a bit wonky though.
Thanks for laying this out. I too understand Hals point.
>
> First we had InfiniBand.
>
> Then came iWARP, and we used the transport type to differentiate it from an
> actual InfiniBand device, but left the underlying link layer listed as
> InfiniBand.
> Then came RoCE, and we listed its transport type as InfiniBand, but changed
> the link layer to Ethernet. Which left us in the oxymoronic position that
> even
> though iWARP was over Ethernet, the tools said it was over InfiniBand, while
> RoCE was the only thing that listed Ethernet as the link layer. We later
> fixed
> that up with some hacks in tools to keep users from being confused and filing
> bugs.
>
> Maybe this represents an opportunity to straighten some of this mess out. If
> I
> remember correctly, this is the matrix of technologies today:
>
> Technology LinkLayer Transport
>
> InfiniBand InfiniBand InfiniBand Verbs
> iWARP InfiniBand iWARP Verbs (subset of IBV, with
> specific connection establishment
> requirements that don't exist with IBV)
> RoCE Ethernet InfiniBand Verbs (but with different
> addressing because of the different
> link layer)
> OPA ? InfiniBand Verbs
I think this is _relatively_ accurate. The one exception is with the various
IB verbs extensions which have been introduced. While most are being pushed
into the spec not all of them are in the spec prior to being in the kernel. It
makes keeping up with what "IB Verbs" really is difficult.
Mind you I'm not opposing having IB Verbs be flexible. But I think we can
accurately have multiple underlying technologies which support IB Verbs with
various extensions.
>
> It makes me wonder if we shouldn't make this matrix more accurate:
>
> Technology LinkLayer Transport
>
> InfiniBand InfiniBand InfiniBand Verbs
> iWARP Ethernet iWARP Verbs
> RoCE Ethernet RoCE-v1 or RoCE-v2
> OPA ? OPA Verbs
>
> With this sort of setup, the core ib_mad/ib_umad code would simply check the
> verbs type to see what support it can enable. For IBV it would be the
> existing
> support, for OPAV it would be the additional jumbo support.
OPA, to be compatible with IB Verbs, uses the same node types as InfiniBand
verbs (1 == CA, 2 == Switch). As such it returns the same Transport type.
>
> I'm not sure how much we might expect a change like this to break existing
> software though, so maybe staightening this mess out is a non-starter.
I think this is going to break quite a bit. I have prototyped setting OPA
devices to "OPA Link Layer" and the perftest tools just fall over. Any changes
to the Link layer or the transport types will require a transition period for
ULPs.
>
> > > While it is a primary goal of the RDMA stack to have a common verbs
> > > API for various RDMA interconnects, each one is properly represented
> > > to allow it's unique characteristics to be exposed.
> >
> > The difference here is that we have maintained IB Verbs compatibility where
> other RDMA technologies did not. We have tested many Verbs applications
> (both kernel and user space) and they function _without_ _modification_.
> >
> > Despite this compatibility we are still having this discussion.
> >
> > I can think of no other way to signal the MAD capability to the MAD stack
> which will preserve the verbs compatibility in the same way.
>
> See above. Define a new transport type, OPAVerbs, that is a superset of IBV
> and enable jumbo support when OPAV is the transport on the link.
But the transport type is not changing. Once again we are attempting to be
completely verbs compatible. From the MAD stack POV the verbs calls in the
kernel are not different.
Would it be acceptable if the result of my patch series was:
InfiniBand InfiniBand InfiniBand Verbs
iWARP InfiniBand iWARP Verbs (subset of IBV, with
specific connection establishment
requirements that don't exist with IBV)
RoCE Ethernet InfiniBand Verbs (but with different
addressing because of the different
link layer)
OPA OPA InfiniBand Verbs
And the MAD stack looked at the link layer to see the difference?
Ira