Doug,

You have given me a lot to think about...  Comments below...

> > > >
> > > > While it is a different type of technology, standard verbs[*]
> > > > remains 100%
> > > compatible.  Unlike other verbs technologies user space software
> > > does not need any knowledge that the underlying device is not IB.
> > > For example, PR (and SA) queries, CM, rdmacm, and verbs calls themselves
> are all 100% IB compatible.
> > >
> > > Even if OPA is 100% standard verbs compatible which it does not
> > > appear to be, that does not make OPA an extra capability of an IBA device.
> >
> > I don't want to make it an extra capability of an IBA device.  I want to 
> > make it
> an extra capability of a "verbs" device.
> 
> And this, friends, is why it's bad to make both a link layer and an user 
> space API
> with the exact same name ;-).  Anyway, I get your point Ira and it makes sense
> to me.  However, I also get Hal's point.  Our track record on this particular
> issue is a bit wonky though.

Thanks for laying this out.  I too understand Hals point.

> 
> First we had InfiniBand.
> 
> Then came iWARP, and we used the transport type to differentiate it from an
> actual InfiniBand device, but left the underlying link layer listed as 
> InfiniBand.
> Then came RoCE, and we listed its transport type as InfiniBand, but changed
> the link layer to Ethernet.  Which left us in the oxymoronic position that 
> even
> though iWARP was over Ethernet, the tools said it was over InfiniBand, while
> RoCE was the only thing that listed Ethernet as the link layer.  We later 
> fixed
> that up with some hacks in tools to keep users from being confused and filing
> bugs.
> 
> Maybe this represents an opportunity to straighten some of this mess out.  If 
> I
> remember correctly, this is the matrix of technologies today:
> 
> Technology    LinkLayer       Transport
> 
> InfiniBand    InfiniBand      InfiniBand Verbs
> iWARP         InfiniBand      iWARP Verbs (subset of IBV, with
>                               specific connection establishment
>                               requirements that don't exist with IBV)
> RoCE          Ethernet        InfiniBand Verbs (but with different
>                               addressing because of the different
>                               link layer)
> OPA           ?               InfiniBand Verbs

I think this is _relatively_ accurate.  The one exception is with the various 
IB verbs extensions which have been introduced.  While most are being pushed 
into the spec not all of them are in the spec prior to being in the kernel.  It 
makes keeping up with what "IB Verbs" really is difficult.

Mind you I'm not opposing having IB Verbs be flexible.  But I think we can 
accurately have multiple underlying technologies which support IB Verbs with 
various extensions.

> 
> It makes me wonder if we shouldn't make this matrix more accurate:
> 
> Technology    LinkLayer       Transport
> 
> InfiniBand    InfiniBand      InfiniBand Verbs
> iWARP         Ethernet        iWARP Verbs
> RoCE          Ethernet        RoCE-v1 or RoCE-v2
> OPA           ?               OPA Verbs
> 
> With this sort of setup, the core ib_mad/ib_umad code would simply check the
> verbs type to see what support it can enable.  For IBV it would be the 
> existing
> support, for OPAV it would be the additional jumbo support.

OPA, to be compatible with IB Verbs, uses the same node types as InfiniBand 
verbs (1 == CA, 2 == Switch).  As such it returns the same Transport type.

> 
> I'm not sure how much we might expect a change like this to break existing
> software though, so maybe staightening this mess out is a non-starter.

I think this is going to break quite a bit.  I have prototyped setting OPA 
devices to "OPA Link Layer" and the perftest tools just fall over.  Any changes 
to the Link layer or the transport types will require a transition period for 
ULPs.

> 
> > > While it is a primary goal of the RDMA stack to have a common verbs
> > > API for various RDMA interconnects, each one is properly represented
> > > to allow it's unique characteristics to be exposed.
> >
> > The difference here is that we have maintained IB Verbs compatibility where
> other RDMA technologies did not.  We have tested many Verbs applications
> (both kernel and user space) and they function _without_ _modification_.
> >
> > Despite this compatibility we are still having this discussion.
> >
> > I can think of no other way to signal the MAD capability to the MAD stack
> which will preserve the verbs compatibility in the same way.
> 
> See above.  Define a new transport type, OPAVerbs, that is a superset of IBV
> and enable jumbo support when OPAV is the transport on the link.

But the transport type is not changing.  Once again we are attempting to be 
completely verbs compatible.  From the MAD stack POV the verbs calls in the 
kernel are not different.

Would it be acceptable if the result of my patch series was:

InfiniBand      InfiniBand      InfiniBand Verbs
iWARP           InfiniBand      iWARP Verbs (subset of IBV, with
                                specific connection establishment
                                requirements that don't exist with IBV)
RoCE            Ethernet        InfiniBand Verbs (but with different
                                addressing because of the different
                                link layer)
OPA             OPA             InfiniBand Verbs

And the MAD stack looked at the link layer to see the difference?

Ira

Reply via email to