On Tue, 2015-01-13 at 22:13 +0200, Or Gerlitz wrote: > On Tue, Jan 13, 2015 at 6:45 PM, Doug Ledford <[email protected]> wrote: > > On Fri, 2015-01-09 at 10:32 +0200, Or Gerlitz wrote: > >> On Wed, Jan 7, 2015 at 5:04 PM, Or Gerlitz <[email protected]> wrote: > >> > From: Erez Shitrit <[email protected]> > >> > > >> > Following commit 016d9fb25cd9 "IPoIB: fix MCAST_FLAG_BUSY usage" > >> > both IPv6 traffic and for the most cases all IPv4 multicast traffic > >> > aren't working. > >> > >> Doug, can you ack the breakage introduced by your commit and the fix? > > > > I haven't double checked the breakage, I'll take your word for it > > just try ping6 or iperf multicast and see it for yourself, please.
I have. I have them working now. > > > (at the time I did my work, I had multicast debugging on and I verified the > > join/leave process, but I had assumed that the process would work the > > same for optional multicast groups as it does for the IPoIB broadcast > > group and other default IPoIB groups, so I didn't specifically test > > additional multicast groups above and beyond the broadcast/etc groups). > > > > However, the fix is not workable. In particular, as soon as this patch > > is added to the kernel, you will start getting messages like this: > > > > mlx4_ib0: ipoib_mcast_leave on an in-flight join > > > I don't see it on my systems, is that upstream you're running? what entity > does > ,;x4_ib0: prefixed prints and under what settings, is that the IPoIB driver? No, that's my internal rhel7 tree, but it's so close to bare upstream when it comes to IPoIB that there really shouldn't be any difference. And mlx4_ib0 is just the name that I renamed ib0 to (An internal standard in my test cluster is that all ib interfaces are named based upon the hardware they are tied to, so mlx4_ib0, mlx5_ib0, qib_ib0, etc. Makes my life in verifying code coverage easier) I can test again with an upstream kernel later today. But, for now, suffice it to say the problem is not resolved with the patch in this thread, but with the much simpler patch I've attached to this email (note, this is made against rhel7, I'm only attaching it so you can try it yourself, assuming it even applies, and once I've tested it on an upstream kernel myself and verified that it works properly, I'll submit the final upstream version under a new thread). -- Doug Ledford <[email protected]> GPG KeyID: 0E572FDD
commit dc2c71896edc3b549e79a51cf83db813d6012a34 Author: Doug Ledford <[email protected]> Date: Tue Jan 13 13:35:40 2015 -0500 IB/ipoib: Fix failed multicast joins/sends The usage of IPOIB_MCAST_RUN as a flag is inconsistent. In some places it is used to mean "our device is administratively allowed to send multicast joins/leaves/packets" and in other places it means "our multicast join task thread is currently running and will process your request if you put it on the queue". However, this latter meaning is in fact flawed as there is a race condition between the join task testing the mcast list and finding it empty of remaining work, dropping the mcast mutex and also the priv->lock spinlock, and clearing the IPOIB_MCAST_RUN flag. Further, there are numerous locations that use the flag in the former fashion, and when all tasks complete and the task thread clears the RUN flag, all of those other locations will fail to ever again queue any work. This results in the interface coming up fine initially, but having problems adding new multicast groups after the first round of groups have all been added and the RUN flag is cleared by the join task thread when it thinks it is done. To resolve this issue, convert all locations in the code to treat the RUN flag as an indicator that the multicast portion of this interface is in fact administratively up and joins/leaves/sends can be performed. There is no harm (other than a slight performance penalty) to never clearing this flag and using it in this fashion as it simply means that a few places that used to micro-optimize how often this task was queued on a work queue will now queue the task a few extra times. We can address that suboptimal behavior in future patches. Signed-off-by: Doug Ledford <[email protected]> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index 8a538c010b9..cba6e160df2 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -638,8 +638,6 @@ void ipoib_mcast_join_task(struct work_struct *work) } ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n"); - - clear_bit(IPOIB_MCAST_RUN, &priv->flags); } int ipoib_mcast_start_thread(struct net_device *dev) @@ -649,8 +647,8 @@ int ipoib_mcast_start_thread(struct net_device *dev) ipoib_dbg_mcast(priv, "starting multicast thread\n"); mutex_lock(&mcast_mutex); - if (!test_and_set_bit(IPOIB_MCAST_RUN, &priv->flags)) - queue_delayed_work(priv->wq, &priv->mcast_task, 0); + set_bit(IPOIB_MCAST_RUN, &priv->flags); + queue_delayed_work(priv->wq, &priv->mcast_task, 0); mutex_unlock(&mcast_mutex); return 0; @@ -733,7 +731,7 @@ void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb) memcpy(mcast->mcmember.mgid.raw, mgid, sizeof (union ib_gid)); __ipoib_mcast_add(dev, mcast); list_add_tail(&mcast->list, &priv->multicast_list); - if (!test_and_set_bit(IPOIB_MCAST_RUN, &priv->flags)) + if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) queue_delayed_work(priv->wq, &priv->mcast_task, 0); } @@ -959,7 +957,8 @@ void ipoib_mcast_restart_task(struct work_struct *work) /* * Restart our join task if needed */ - ipoib_mcast_start_thread(dev); + if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) + queue_delayed_work(priv->wq, &priv->mcast_task, 0); rtnl_unlock(); }
signature.asc
Description: This is a digitally signed message part
