I think I see the cause of this bug. Upon resume, as previously mentioned, the client receives a SIGTHAW and goes back through INIT_REBOOT (refresh_smachs() -> refresh_smach() -> dhcp_init_reboot()). If the server sends a DHCPNAK to the client (e.g., because the lease has expired) then the client will end up clearing IFF_UP (accept_v4_acknak() -> dhcp_restart() -> deprecate_leases() -> remove_lease() -> unplumb_lif() -> canonize_lif()). Next, dhcp_restart() causes dhcp_start() to be called, which sets the state back to INIT. As part of doing this, *usually* we'll end up calling open_ip_lif() which will set IFF_UP again. However, since we were only in INIT_REBOOT, the call to open_ip_lif() is skipped, and the interface remains down. As a result, all DHCP packets sent by the client are dropped by the stack, and a new lease is never obtained.
I see a few possible fixes. My preference would be to continue to shift the DHCP client out of the business of messing with IFF_UP. That is, I think we could just have canonize_lif() leave the interface IFF_UP (but still set the address to 0.0.0.0). This would be a trivial change, and would line up well with my proposed changes to have the DHCP client no longer monitor the IFF_UP flag (which is needed to allow DHCP to work smoothly for IPMP test addresses[1]), since the client wouldn't need to concern itself with whether it or another process cleared IFF_UP. [1] http://opensolaris.org/jive/thread.jspa?threadID=52800&tstart=15 -- meem _______________________________________________ networking-discuss mailing list [email protected]
