On Wed, 2015-05-27 at 11:11 -0600, Jason Gunthorpe wrote:
> On Wed, May 27, 2015 at 10:14:06AM -0400, Doug Ledford wrote:
> > > Because the QPN is part of the LLADDR IB can create two interfaces on
> > > the same physical port that are completely separated by hardware. Read
> > > Haggi's email, he explains how they plan to use this to create
> > > interfaces that can be delegated to namespaces. It is not a bad idea
> > > really.. 
> > 
> > Yes, it is actually.  The whole reason we went to GUID matching long ago
> > was because of this exact issue.
> 
> I reflected on this some more last night, and yes, I am leaning toward
> 'bad idea' direction too.
> 
> Too much stuff breaks if you create multiple children with the same
> pkey/guid:
>  - RDMA CM cannot disambiguate CM packets between them
>  - DHCP cannot tell them apart
>  - Net scripts/network manager won't work
>  - IPv6 becomes totally broken
> 
> That means the namespace stuff will have to create children using GUID
> aliases..

Glad we agree on that ;-)

> > The *only* way this will ever be a workable item is if we A) reserve a
> > number of queue pairs from the driver specifically for IPoIB use and B)
> > specify which queue pairs go to which IPoIB devices at IPoIB module
> > load
> 
> This basic idea is exactly why I think we should stick with the 20
> byte LLADDR for ILFA_VF_MAC. It gives a route for the PF to tell the
> VF what QPN to use for IPoIB (if we ever see HW support to implement that)
> 
> If we use 8 bytes then that route is closed off forever.

And that's exactly as it should be.  If we allow setting all 20 bytes
via the VF_MAC calls, then we violate the "guests should behave like
they are on bare metal as much as possible" rule.  As a host, we get a
GUID and if we want to control the QPNs for IPoIB (and indeed if we want
to control how many IPoIB interfaces and on what P_Keys) then we must
create config files in /etc/sysconfig/network-scripts (on Red Hat,
similar requirements on other distros) that would instruct the OS to
create exactly what we want.  But, the key point is that we are only
given a GUID, and we must create everything else from our config files.
Guests should be the same way.  They only get the GUID to start, then
the guest disk image with its self contained configuration will take
over and control the rest.

> > > Not quite, in the 20 byte format the 8 bytes of the GUID are in the
> > > last 8/20 bytes, so the app would have to place 12 zeros and then the
> > > GUID to follow the 20 byte format (or 4 zeros, the prefix, then the GUID)
> > > 
> > > This is why the question of 'what is ILFA_VF_MAC' is so important,
> > > every option presented (MAC,GUID,LLADDR) are incompatible with each
> > > other.
> > 
> > For Ethernet devices, it's the MAC.  The equivalent of MAC on IB is the
> > GUID.  I would leave it at that.
> 
> Yes, both arguments can be made:
>   - Our netlink end point is targetting an IPoIB interface, and
>     the equivelent to an Ethernet MAC in IPoIB language is the LLADDR.
>   - Our netlink interface is targetting the hardware under the IPoIB
>     interface and that MAC equivilent is the GUID
> 
> > IPoIB devices are constructs on top of
> > the GUID/link, and you can have 10 IPoIB interfaces between the parent
> > and children, but we don't need to specify all of those LLADDRs, we just
> > need to give a unique GUID and allow the guest OS to create their own
> > IPoIB devices on top of that.
> 
> As I've said, I would like to see netdev review that idea before we
> merge any patches..
> 
> There are pragmatic downsides to the 8 byte choice: Userspace
> completely looses the ability to size the address without a table
> based on link type. That is terrible in the context of netlink's
> design.

Well, let's just be clear: netlink/iproute2 screwed the pooch on their
implementation of this stuff.  Any solution that doesn't include fixing
this up in some way is not really a good solution.

>  For instance iproute2 would need IB specific code to format
> the 'ip link show' (review print_vfinfo in iproute2) and to length
> check 'ip link set vf mac'
> 
> If we do use 8, then it would be ideal (and my strong preference) to
> also fix the IFLA_VF_MAC message to have a working length. I think
> that could be done compatibly with a bit of work. At least that way
> iproute2 can be kept clean when it learns to do IB, and we could have
> the option again of using 20 someday if we need.
> 
> So to be clear, to go with the 8 byte option I suggest:
>  - Engage netdev/iproute and confirm they are philosophically OK
>    with IFLA_VF_MAC != IFLA_ADDRESS
>  - Make a kernel patch to properly size the IFLA_VF_MAC message
>  - Make a iproute patch to use the IFLA_VF_MAC size in print_vfinfo
>    instead of hardcoded ETH_ALEN (using len == 32 mean len 6 for compat)
>  - Drop in the IB patch

Sounds like a reasonable plan.

Or, this is your patch set, are you going to pick up these action items?

-- 
Doug Ledford <[email protected]>
              GPG KeyID: 0E572FDD

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to