On Thu, Jan 22, 2015 at 10:40 PM, Doug Ledford <[email protected]> wrote: > On Thu, 2015-01-22 at 15:31 +0200, Or Gerlitz wrote: >> Changes from V1: >> 1. always do clear_bit(IPOIB_MCAST_FLAG_BUSY) >> inipoib_mcast_sendonly_join_complete()
> This part is good. good to know... >> 2. Sync between ipoib_mcast_sendonly_join() to >> ipoib_mcast_sendonly_join_complete >> using a IS_ERR_OR_NULL() test > This part is no good. You just added a kernel data corrupter or kernel > oopser depending on the situation. Yep, you probably have a point here, but didn't that exist prior to your 3.19-rc1 series too? in other words, if bug X was there before and you added bug Y and we fix Y that's fine and by the rules. > These items make a passable effort at fixing up the same items as > patches 1 and 2 in my patchset. Patch 3 in my patchset was just > something I noticed, but it isn't drastically important. > This alternative patch does nothing to address what is fixed in these > patches though: The below list is too heavy for rc6 fixes > patch 4: we leaked joins as a result of handling ENETRESET wrong. you > wouldn't see this unless your testing included taking the network up and > down via killing opensm or the like, but it's a real issue > > patch 5: over zealous rescheduling of join task that got in the way of > the flush thread > > patch 6: took out an unneeded spinlock > > patch 7: during my testing, I got an oops in ipoib_mcast_join that is > the result of lack of locking between the flush task and the > mcast_join_task thread > > patch 8: there is a legitimate leak of mcast entries any time we process > this list and get to the end only to find that our device is no longer > up, and given that the debug messages show that we call > mcast_restart_task every time right before we call mcast_dev_flush when > we are downing the interface, the chance of leak here is very high > > patch 9: similar to patch 4, you don't see this unless you are abusing > opensm in order to trigger net events, and then combining that with > removing the module at the same time. but if you have a queue net event > and you don't flush after you unregister, it can oops your kernel later > when the net event finally runs > > patch 10: if you have mcast debugging on, that printk can actually cause > problems that don't otherwise exist because it isn't rate limited in any > way > > This interim patch you are suggesting addresses the first couple items > my patches address, but it also introduces data corrupter/oopser, but > also like I said in a previous email, your testing is myopic, just like > my original testing was, and you are ignoring other items and thinking > your patch is OK when it really isn't. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
