I ran across a problem today when I went to do some run tests of my
for-4.2 tree.  For a second there, I was about to seriously have a
conniption fit.  But, after about 6 hours of work bisecting and
debugging, I've come to find that I wasn't so crazy after all.

When I went to install my for-4.2 tree, IPoIB was totally busted, as in
DOA.  I knew the 4.1 code I submitted to Linus I had checked, but I
wanted to have a good starting point for a bisection so I compiled a
kernel from my for-4.1-rc branch.  And it was DOA too.  That seriously
unnerved me because I knew I tested that code.  I did a number of manual
checkouts at possible suspicious code points, and none of them showed
that the problem was resolved.  Then I started doing some debugging on
both the afflicted machine and on the opensm server.  I finally saw that
the afflicted machine was claiming that it was attempting to join the
multicast group, but was reporting error 110 (ETIMEDOUT).  The opensm
server was not seeing the requests at all.

Long story short, I did my testing in the 4.1 merge window and rc phase
on machines without SRIOV enabled, but when you enable SRIOV in the mlx4
driver, the current driver seems to have broken QP0/QP1 multiplexing
support because the host becomes unable to join the IPoIB multicast
groups.  In addition, with SRIOV enabled, mlx4_en throws corruption
errors on reboot and requires that the machine be power cycled as
opposed to rebooting cleanly.  From what I can tell, the 4.0 release
kernel has this problem too, and it still exists at least as far as
4.1-rc7 + all of my queued up -next patches.

From my /etc/modprobe.d/mlx4.conf file if you want to try and duplicate:

options mlx4_core probe_vf=0 num_vfs=7 port_type_array=1,2
options mlx4_en pfctx=0x28 pfcrx=0x28

And I'm guessing that your internal regression tests must not have a
machine in IB/Eth SRIOV mode as a standard config.  I would consider
adding it to the mix.  I have it myself, but only on a few machines and
I don't always use them for initial testing.

-- 
Doug Ledford <[email protected]>
              GPG KeyID: 0E572FDD


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to