I think I see the cause of this bug.  Upon resume, as previously
mentioned, the client receives a SIGTHAW and goes back through INIT_REBOOT
(refresh_smachs() -> refresh_smach() -> dhcp_init_reboot()).  If the
server sends a DHCPNAK to the client (e.g., because the lease has expired)
then the client will end up clearing IFF_UP (accept_v4_acknak() ->
dhcp_restart() -> deprecate_leases() -> remove_lease() -> unplumb_lif() ->
canonize_lif()).  Next, dhcp_restart() causes dhcp_start() to be called,
which sets the state back to INIT.  As part of doing this, *usually* we'll
end up calling open_ip_lif() which will set IFF_UP again.  However, since
we were only in INIT_REBOOT, the call to open_ip_lif() is skipped, and
the interface remains down.  As a result, all DHCP packets sent by the
client are dropped by the stack, and a new lease is never obtained.

I see a few possible fixes.  My preference would be to continue to shift
the DHCP client out of the business of messing with IFF_UP.  That is, I
think we could just have canonize_lif() leave the interface IFF_UP (but
still set the address to 0.0.0.0).  This would be a trivial change, and
would line up well with my proposed changes to have the DHCP client no
longer monitor the IFF_UP flag (which is needed to allow DHCP to work
smoothly for IPMP test addresses[1]), since the client wouldn't need to
concern itself with whether it or another process cleared IFF_UP.

[1] http://opensolaris.org/jive/thread.jspa?threadID=52800&tstart=15

-- 
meem
_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to