On Tue, Aug 18, 2020 at 04:53:42PM +1000, Jonathan Matthew wrote:
> On Mon, Aug 17, 2020 at 03:32:35PM -0400, Winfred Harrelson wrote:
> > On Mon, Aug 17, 2020 at 03:40:47PM +0200, Hrvoje Popovski wrote:
> > > On 17.8.2020. 11:46, Stuart Henderson wrote:
> > > > On 2020-08-15, Hrvoje Popovski <hrv...@srce.hr> wrote:
> > > >> On 15.8.2020. 0:48, Hrvoje Popovski wrote:
> > > >>> On 12.8.2020. 15:18, Winfred Harrelson wrote:
> > > >>>> On Tue, Aug 11, 2020 at 07:52:10PM +0100, Tom Smyth wrote:
> > > >>>>> Hi Winfred,
> > > >>>>> the intel 710 is a complex card,  I would suggest that you try 
> > > >>>>> updating the
> > > >>>>> firmware on the card, available from intel.com or your card vendor,
> > > >>>>> you may have to boot to a live linux cd to apply the firmware 
> > > >>>>> update,
> > > >>>>>
> > > >>>>> but I had some issues with the Intel XL710 cards and I had to 
> > > >>>>> update the
> > > >>>>> firmware to get it working stable,
> > > >>>>>
> > > >>>>> I hope this helps
> > > >>>>> Tom Smyth
> > > >>>>
> > > >>>> Adding misc@openbsd.org back to the CC for the record.
> > > >>>>
> > > >>>> Thanks for the quick reply.  I didn't reply back yesterday because I
> > > >>>> was having trouble getting the firmware updated from a Linux boot 
> > > >>>> disk.
> > > >>>> I ended up having to try from a Windows boot disk.  Unfortunately, I
> > > >>>> am getting the same thing again:
> > > >>>>
> > > >>>>
> > > >>>> wharrels@styx2:/home/wharrels# dmesg | grep ^ixl
> > > >>>> ixl0 at pci5 dev 0 function 0 "Intel XXV710 SFP28" rev 0x02: port 0, 
> > > >>>> FW 8.0.61820 API 1.11, msix, 8 queues, address 3c:fd:fe:ed:b7:28
> > > >>>> ixl1 at pci5 dev 0 function 1 "Intel XXV710 SFP28" rev 0x02: port 1, 
> > > >>>> FW 8.0.61820 API 1.11, msix, 8 queues, address 3c:fd:fe:ed:b7:29
> > > >>>> ixl2 at pci8 dev 0 function 0 "Intel XXV710 SFP28" rev 0x02: port 0, 
> > > >>>> FW 8.0.61820 API 1.11, msix, 8 queues, address 3c:fd:fe:eb:19:b0
> > > >>>> ixl3 at pci8 dev 0 function 1 "Intel XXV710 SFP28" rev 0x02: port 1, 
> > > >>>> FW 8.0.61820 API 1.11, msix, 8 queues, address 3c:fd:fe:eb:19:b1
> > > >>>> ixl4 at pci12 dev 0 function 0 "Intel X722 10GBASE-T" rev 0x09: port 
> > > >>>> 0, FW 3.1.57069 API 1.5, msix, 8 queues, address 3c:ec:ef:1a:df:f2
> > > >>>> ixl5 at pci12 dev 0 function 1 "Intel X722 10GBASE-T" rev 0x09: port 
> > > >>>> 1, FW 3.1.57069 API 1.5, msix, 8 queues, address 3c:ec:ef:1a:df:f3
> > > >>>>
> > > >>>> Yup, all the XXV710 cards have been updated to newest firmware.
> > > >>>>
> > > >>>> Now for the (failed) attempt:
> > > >>>>
> > > >>>> wharrels@styx2:/etc# ifconfig ixl0
> > > >>>> ixl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> > > >>>>         lladdr 3c:fd:fe:ed:b7:28
> > > >>>>         index 1 priority 0 llprio 3
> > > >>>>         media: Ethernet autoselect (25GbaseSR full-duplex)
> > > >>>>         status: active
> > > >>>> wharrels@styx2:/etc# ifconfig ixl2 
> > > >>>> ixl2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> > > >>>>         lladdr 3c:fd:fe:eb:19:b0
> > > >>>>         index 3 priority 0 llprio 3
> > > >>>>         media: Ethernet autoselect (25GbaseSR full-duplex)
> > > >>>>         status: active
> > > >>>> wharrels@styx2:/etc# ifconfig aggr1 create
> > > >>>> wharrels@styx2:/etc# ifconfig aggr1 trunkport ixl0
> > > >>>> wharrels@styx2:/etc# ifconfig aggr1 trunkport ixl2
> > > >>>> wharrels@styx2:/etc# ifconfig aggr1 up
> > > >>>> wharrels@styx2:/etc# ifconfig aggr1
> > > >>>> aggr1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> > > >>>>         lladdr fe:e1:ba:d0:7c:e9
> > > >>>>         index 11 priority 0 llprio 7
> > > >>>>         trunk: trunkproto lacp
> > > >>>>         trunk id: [(8000,fe:e1:ba:d0:7c:e9,000B,0000,0000),
> > > >>>>                  (0000,00:00:00:00:00:00,0000,0000,0000)]
> > > >>>>                 ixl0 lacp actor system pri 0x8000 mac 
> > > >>>> fe:e1:ba:d0:7c:e9, key 0xb, port pri 0x8000 number 0x1
> > > >>>>                 ixl0 lacp actor state activity,aggregation,defaulted
> > > >>>>                 ixl0 lacp partner system pri 0x0 mac 
> > > >>>> 00:00:00:00:00:00, key 0x0, port pri 0x0 number 0x0
> > > >>>>                 ixl0 lacp partner state activity,aggregation,sync
> > > >>>>                 ixl0 port 
> > > >>>>                 ixl2 lacp actor system pri 0x8000 mac 
> > > >>>> fe:e1:ba:d0:7c:e9, key 0xb, port pri 0x8000 number 0x3
> > > >>>>                 ixl2 lacp actor state activity,aggregation,defaulted
> > > >>>>                 ixl2 lacp partner system pri 0x0 mac 
> > > >>>> 00:00:00:00:00:00, key 0x0, port pri 0x0 number 0x0
> > > >>>>                 ixl2 lacp partner state activity,aggregation,sync
> > > >>>>                 ixl2 port 
> > > >>>>         groups: aggr
> > > >>>>         media: Ethernet autoselect
> > > >>>>         status: no carrier
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> I tried doing another sysupgrade this morning just in case something
> > > >>>> had changed overnight but no luck.  Any other ideas?
> > > >>>>
> > > >>>> Winfred
> > > >>>>
> > > >>>
> > > >>> Hi,
> > > >>>
> > > >>> could you try install snapshot from http://ftp.hostserver.de/archive/
> > > >>> that is older than Thu Jun 25 06:41:38 2020 UTC ...
> > > >>>
> > > >>> maybe this commit broke xxv710
> > > >>> http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/dev/pci/if_ixl.c?rev=1.56&content-type=text/x-cvsweb-markup
> > > >>>
> > > >>> i have vlans over aggr over x710-da2 with latest snapshot and it's
> > > >>> working as expected ..
> > > >>>
> > > >>> ixl0 at pci1 dev 0 function 0 "Intel X710 SFP+" rev 0x02: port 0, FW
> > > >>> 7.3.60988 API 1.10, msix, 8 queues
> > > >>> ixl1 at pci1 dev 0 function 1 "Intel X710 SFP+" rev 0x02: port 1, FW
> > > >>> 7.3.60988 API 1.10, msix, 8 queues
> > > >>>
> > > >>
> > > >> with new firmware aggr is working
> > > >>
> > > >> ixl0 at pci1 dev 0 function 0 "Intel X710 SFP+" rev 0x02: port 0, FW
> > > >> 8.0.61820 API 1.11, msix, 8 queues
> > > >> ixl1 at pci1 dev 0 function 1 "Intel X710 SFP+" rev 0x02: port 1, FW
> > > >> 8.0.61820 API 1.11, msix, 8 queues
> > > > 
> > > > That's the same firmware as in your previous (failing) report,
> > > > so is that "with new firmware and a snapshot from before Thu Jun 25"?
> > 
> > Stuart, you may have gotten message from Hrvoje confused with mine
> > (Winfred).  Hrvoje seems to have gotten this to work but I haven't.
> > I can use trunk(4) but I just think it would be nice to try to find
> > out what is going on here.  Don't want to be a pain though.
> > 
> > > 
> > > it would be great if winfred could test snapshot before Jun 25 with
> > > xxv710 card. x710 card works great with new firmware (8.0) and older one
> > > 7.3 ..
> > 
> > I have no way of testing this (25Gbps cards in lacp bond) at home
> > so I have been testing at work.  This is why I haven't done anything
> > over the weekend.
> > 
> > Grabbed snapshot from 2020-06-24 with same results:
> 
> This sounds like multicast filters aren't working properly with your nic.
> trunk(4) puts trunk ports in promisc mode, so multicast filters don't matter,
> but aggr(4) doesn't.  Could you try running 'tcpdump -ni ixl0' for a while and
> see if that side of the aggr starts working?

I left the tcpdump running for a little over 5 minutes but that changed nothing:

wharrels@styx2:/etc# ifconfig aggr1   
aggr1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr fe:e1:ba:d1:25:69
        index 12 priority 0 llprio 7
        trunk: trunkproto lacp
        trunk id: [(8000,fe:e1:ba:d1:25:69,000C,0000,0000),
                 (0000,00:00:00:00:00:00,0000,0000,0000)]
                ixl0 lacp actor system pri 0x8000 mac fe:e1:ba:d1:25:69, key 
0xc, port pri 0x8000 number 0x1
                ixl0 lacp actor state activity,aggregation,defaulted
                ixl0 lacp partner system pri 0x0 mac 00:00:00:00:00:00, key 
0x0, port pri 0x0 number 0x0
                ixl0 lacp partner state activity,aggregation,sync
                ixl0 port 
                ixl1 lacp actor system pri 0x8000 mac fe:e1:ba:d1:25:69, key 
0xc, port pri 0x8000 number 0x2
                ixl1 lacp actor state activity,aggregation,defaulted
                ixl1 lacp partner system pri 0x0 mac 00:00:00:00:00:00, key 
0x0, port pri 0x0 number 0x0
                ixl1 lacp partner state activity,aggregation,sync
                ixl1 port 
        groups: aggr
        media: Ethernet autoselect
        status: no carrier


I also ran the same tcpdump on ixl1 but that didn't help.

> Other parts of the output indicate we're not compatible with some aspects of 
> the
> new firmware API, so I guess we have some work to do there.

Wasn't working with the older firmware for the XXV710 cards either.
I originally had FW 6.0.48442 before updating to the newest.

Don't know how much longer I can keep messing with this before I have
to go back to trunk(4) and put this box into production.

I will try to get another test box up soon if I can.  I also may be
able to get hold of some Mellanox cards for testing.

Thanks again to everyone for your time and help!

Winfred

Reply via email to