> Author: meem <Peter.Memishian at Sun.COM> > Changeset: 8533:66762dc10e32 > Comments: > mcast_restart_timers_thread() may not always run when requested > mcast_restart_timers_thread() panics on exit > ilm_walker_step() should skip ILM_DELETE ILMs > IPMP stress reveals igmp_input() ill_lock deadlock > IPMP stress reveals race between ipif_free() and ill_src_ipif > assorted comment fixes
This wad deserves some careful review; please see: http://zhadum.east/ws/clearview/clearview-ipmpdev/webrev.ipfixes.2/ The first fix is to address the issue Thiru noticed, which is that we might not always restart the right timers if the restart-timers thread is running when we signal it. The fix is introduces a new IP_MRT_RUN flag that is atomically set/checked under ips_mrt_lock. I also fixed a related bug that might cause us to miss an IP_MRT_STOP request. The second fix addresses a panic that Xiang found: CALLB_CPR_EXIT() expects to drop ips_mrt_lock itself, so we need to leave it locked prior to the call to CALLB_CPR_EXIT(). The third fix addresses something I noticed while working on the fourth fix: basically, there are a bunch of places (in Nevada too) where we may try to use ILMs that have been marked ILM_DELETED. Rather than pepper the code with these checks, I took advantage of the new ilm_walker_step() function to move the checks into it. (There's a glitch in my approach I noticed while writing this up, which is that ilm_walker_start() doesn't have similar logic. I'll fix this shortly.) The fourth fix addresses an interesting bug that Seb's awesome IPMP stress test found, and cleans up some code to boot. Basically, there was a case in igmp_input() where we may simultaneously lock two ills: one ill_lock is acquired while walking its ipif_next chain, and another is locked during that walk via ilm_lookup_ipif() in order to bump the hold count on the associated IPMP ill. These ills are not locked in a defined order, so it can deadlock with other places that lock two ills in a different order via ill_lock_ills(). The fix is to rework igmp_input() so it doesn't need to lock two ills at a time, using the same approach that mld_input() already used for this. It also removes the ilm_walker_startx() wart. The fifth fix addresses another bug that the IPMP stress test found. In short, if an ipif is in the process of being freed at the same time another thread is updating the ill_src_ipif rotor to point to it in ipif_select_source{,_v6}(), then ill_src_ipif will end up pointing to freed memory. This race is possible because there's a window between ipif_free() and ipif_free_tail() where ill_g_lock is dropped. In this window, if the ipif is assigned to ill_src_ipif, then nothing will clear that bad pointer (since ipif_free() is responsible for clearing it and it's already finished). The fix is to update ipif_select_source{,_v6}() to call IPIF_CAN_LOOKUP() prior to assigning ill_src_ipif, since IPIF_CONDEMNED will be set prior to ipif_free() clearing ill_src_ipif. The sixth fix addresses assorted comment typos that I noticed while working on the above issues. Most of them are Nevada typos. -- meem
