> Author:    meem <Peter.Memishian at Sun.COM>
 > Changeset: 8533:66762dc10e32
 > Comments:
 > mcast_restart_timers_thread() may not always run when requested
 > mcast_restart_timers_thread() panics on exit
 > ilm_walker_step() should skip ILM_DELETE ILMs
 > IPMP stress reveals igmp_input() ill_lock deadlock
 > IPMP stress reveals race between ipif_free() and ill_src_ipif
 > assorted comment fixes

This wad deserves some careful review; please see:

  http://zhadum.east/ws/clearview/clearview-ipmpdev/webrev.ipfixes.2/

The first fix is to address the issue Thiru noticed, which is that we
might not always restart the right timers if the restart-timers thread is
running when we signal it.  The fix is introduces a new IP_MRT_RUN flag
that is atomically set/checked under ips_mrt_lock.  I also fixed a related
bug that might cause us to miss an IP_MRT_STOP request.

The second fix addresses a panic that Xiang found: CALLB_CPR_EXIT()
expects to drop ips_mrt_lock itself, so we need to leave it locked
prior to the call to CALLB_CPR_EXIT().

The third fix addresses something I noticed while working on the fourth
fix: basically, there are a bunch of places (in Nevada too) where we may
try to use ILMs that have been marked ILM_DELETED.  Rather than pepper the
code with these checks, I took advantage of the new ilm_walker_step()
function to move the checks into it.  (There's a glitch in my approach I
noticed while writing this up, which is that ilm_walker_start() doesn't
have similar logic.  I'll fix this shortly.)

The fourth fix addresses an interesting bug that Seb's awesome IPMP stress
test found, and cleans up some code to boot.  Basically, there was a case
in igmp_input() where we may simultaneously lock two ills: one ill_lock is
acquired while walking its ipif_next chain, and another is locked during
that walk via ilm_lookup_ipif() in order to bump the hold count on the
associated IPMP ill.  These ills are not locked in a defined order, so it
can deadlock with other places that lock two ills in a different order via
ill_lock_ills().  The fix is to rework igmp_input() so it doesn't need to
lock two ills at a time, using the same approach that mld_input() already
used for this.  It also removes the ilm_walker_startx() wart.

The fifth fix addresses another bug that the IPMP stress test found.  In
short, if an ipif is in the process of being freed at the same time
another thread is updating the ill_src_ipif rotor to point to it in
ipif_select_source{,_v6}(), then ill_src_ipif will end up pointing to
freed memory.  This race is possible because there's a window between
ipif_free() and ipif_free_tail() where ill_g_lock is dropped.  In this
window, if the ipif is assigned to ill_src_ipif, then nothing will clear
that bad pointer (since ipif_free() is responsible for clearing it and
it's already finished).  The fix is to update ipif_select_source{,_v6}()
to call IPIF_CAN_LOOKUP() prior to assigning ill_src_ipif, since
IPIF_CONDEMNED will be set prior to ipif_free() clearing ill_src_ipif.

The sixth fix addresses assorted comment typos that I noticed while
working on the above issues.  Most of them are Nevada typos.

-- 
meem

Reply via email to