This patchset revives the 8 patches that were reverted from 3.19,
the 11 patches that fixed the problems with the first 8, the single
patch that was related to reaping of ah's and failure to dealloc
resources on shutdown, and then adds two new patches that would
have been enhancements and not bugfixes and hence weren't appropriate
to post in the 3.19 tussle.

Testing of this patchset is currently underway, but it has done
well so far.  IPv4 multicast, IPv6 multicast, connected mode,
datagram mode, rmmod/insmod while active, restart opensm while
active, ifconfig up/ifconfig down in a tight while loop have
all passed.

There are two outstanding issues that I think stilll need addressed
(while performing all the other testing I ran across these issues,
and I think they existed prior to my patchset, but I haven't booted
up a clean kernel to verify it yet...I'll do that tomorrow and if
things are not as I expect, I'll report back here):

1) In connected mode, the initial ip6 ping to any host takes almost
exactly 1 second to complete.  The debug messages show this delay
very clearly:

[19059.689967] qib_ib0: joining MGID ff12:601b:ffff:0000:0000:0001:ff31:7791
[19059.689970] qib_ib0: successfully started all multicast joins
[19059.690313] qib_ib0: sendonly join completion for 
ff12:601b:ffff:0000:0000:0001:ff31:7791 (status 0)
[19059.690314] qib_ib0: Created ah ffff88080cc0ef60
[19059.690315] qib_ib0: MGID ff12:601b:ffff:0000:0000:0001:ff31:7791 AV 
ffff88080cc0ef60, LID 0xc035, SL 0 <- Final debug message from creating our AH 
and when we should have requeued our sends
[19060.694190] qib_ib0: REQ arrived <- almost exactly 1 second later, we
finally start setting up our connection

In datagram mode, this does not happen and initial startup of ping6
is mostly immediate.

2) In connected mode, restartng opensm repeatedly can cause some of
the machines to start failing to find other machines when trying to
use ping6.  However, they don't loose connectivity to all machines,
only specific machines.  A rmmod/insmod cycle solves the problem.
So does a full ifdown/ifup cycle.  Given enough idle time, the
problem goes away.  I suspect that neighbor flushing when in
connected mode is not reliable/sufficient when opensm events come
in.  Again, I think this exists in the upstream kernel and I'll
test more on that tomorrow.

Doug Ledford (22):
  IB/ipoib: Consolidate rtnl_lock tasks in workqueue
  IB/ipoib: Make the carrier_on_task race aware
  IB/ipoib: fix MCAST_FLAG_BUSY usage
  IB/ipoib: fix mcast_dev_flush/mcast_restart_task race
  IB/ipoib: change init sequence ordering
  IB/ipoib: Use dedicated workqueues per interface
  IB/ipoib: Make ipoib_mcast_stop_thread flush the workqueue
  IB/ipoib: No longer use flush as a parameter
  IB/ipoib: fix IPOIB_MCAST_RUN flag usage
  IB/ipoib: Add a helper to restart the multicast task
  IB/ipoib: make delayed tasks not hold up everything
  IB/ipoib: Handle -ENETRESET properly in our callback
  IB/ipoib: don't restart our thread on ENETRESET
  IB/ipoib: remove unneeded locks
  IB/ipoib: fix race between mcast_dev_flush and mcast_join
  IB/ipoib: fix ipoib_mcast_restart_task
  IB/ipoib: flush the ipoib_workqueue on unregister
  IB/ipoib: cleanup a couple debug messages
  IB/ipoib: make sure we reap all our ah on shutdown
  IB/ipoib: don't queue a work struct up twice
  IB/ipoib: deserialize multicast joins
  IB/ipoib: drop mcast_mutex usage

 drivers/infiniband/ulp/ipoib/ipoib.h           |  20 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c        |  18 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c        |  69 ++--
 drivers/infiniband/ulp/ipoib/ipoib_main.c      |  51 ++-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 479 ++++++++++++-------------
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c     |  24 +-
 6 files changed, 356 insertions(+), 305 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to