Some of you may recall that we've sporadically been seeing IPMP group
destroy bugs wherein a an IPMP group cannot be recreated because the ARP
plumbing for the group interface didn't get properly destroyed. Well, I
finally managed to get a semi-reproducible testcase for the bug yesterday
and eliminate it.
However, the semi-reproducible testcase itself spurred new questions. In
particular, one requirement for destroying a group is that all its data
addresses must be brought down -- that is, ill_ipif_up_count on group's
IPMP ill must be zero. To satisfy this constraint, ifconfig internally
builds a list of IFF_UP addresses and brings each one down. However,
sometimes, despite ifconfig's request to bring each one down, some would
still be up. Digging around revealed what seems to be a serious and
longstanding bug that basically causes bringing down an address to
silently not work.
To wit, if ipif_down() finds that the address is not quiesced, it enqueues
the ioctl request returns and EINPROGRESS to ip_sioctl_flags(). Once the
ipif has quiesced ipif_ill_refrele_tail() calls ip_sioctl_flags_restart(),
which does:
ipif_down_tail(ipif);
if (ipip->ipi_cmd_type == IF_CMD) {
err = ip_sioctl_flags_tail(ipif,
(uint64_t)(ifr->ifr_flags & 0x0000ffff),
q, mp, B_TRUE);
} else {
err = ip_sioctl_flags_tail(ipif, lifr->lifr_flags,
q, mp, B_TRUE);
}
Note in particular the last argument to ip_sioctl_flags_tail() -- this
causes it to *always* bring the ipif back up, contradicting the
application's request to bring the ipif down. In contrast, the normal
(non-asynchronous path) is in ip_sioctl_flags() is careful to do:
ipif_down_tail(ipif);
}
return (ip_sioctl_flags_tail(ipif, flags, q, mp, need_up));
^^^^^^^
This seems like a pretty serious bug. Am I missing something here?
One possible fix would be to remove the need_up argument and instead track
this state on the ipif via a new ipif_need_up field.
--
meem