On Wed, 2017-11-22 at 16:18 +0100, Piotr Figiel wrote: > Hi, > I'm looking into couple issues we stumbled on with pppd and > NetworkManager. It's tested on NetworkManager 1.8.4, ModemManager > 1.6.8 and pppd 2.4.7. In general there are sporadic issues with > reconnecting when pppd exits because of LCP echo timeout and when it > exits due to other reasons. > > In simplest scenario with active pppd connection when pppd crashes/is > forcibly killed and nm pppd plug-in doesn't have a chance to send the > status update via D-Bus to NM the connection is not deactivated > (nmcli > c shows it's up). > I've done some analysis and NM does notice the pppd process exiting, > it enters nm-device-modem.c:ppp_failed() but because the "may-fail" > ipv4 setting defaults to TRUE not much happens (the connection losses > IP address and gateway setting but is still active). > The behavior doesn't seem correct as according to the nm-settings(5) > the connection should fail if none of ipv4 and ipv6 are available > (even when they have may-fail setting to TRUE). > > There are actually more issues with nm-device-modem.c:ppp_failed(). > Normally when pppd session ends (e.g. because of LCP echo timeout) it > seems there are 2 messages sent from pppd which trigger ppp_failed(): > SetState with NM_PPP_STATUS_DISCONNECT and NM_PPP_STATUS_DEAD. > Additionally pppd process exiting triggers the ppp_failed() for the > 3rd time. In such conditions NM deactivates connection with error: > <warn> [1511357452.8500] device (ttyACM0): PPP failure in unexpected > state 100 > This seems to be a side effect of the may-fail logic and the fact > that > multiple events result in call to ppp_failed() but this side effect > deactivates the device and triggers reconnect later on (ie fixes the > issue). > > This though doesn't work always like that: I observed that > sporadically the nm pppd plugin doesn't send the NM_PPP_STATUS_DEAD > state (observed with dbus-monitor). I think it's related with > g_dbus_proxy_call() being used in the plugin code, which is > asynchronous and it may sometimes not complete before pppd exit. > In such case and some amount of bad luck ppp_failed() is called > because of NM_PPP_STATUS_DISCONNECT and pppd exiting only (i.e. only > two times). If in such case ip4_state is configured (with may-fail) > and e.g. ip6_state == IP_CONF then the connection will not be > deactivated and reconnection won't be triggered. > > I think for the LCP echo timeout error case setting may-fail to FALSE > on ipv4 on ppp connections should solve the issue. But if anything > from above makes sense to you it seems there are at least couple bugs > there. I think the simplest solution would be to replace > nm-device-modem.c:ppp_failed() contents with: > > nm_device_state_changed (device, NM_DEVICE_STATE_FAILED, reason); > > Or is there any case when pppd failure should leave the connection > active? > Additionally perhaps g_dbus_proxy_call in > nm-pppd-plugin.c:nm_phasechange() could be changed to sync > counterpart.
Hi Piotr, may-fail is supposed to behave as you explained, and in general we should implement the logic, that a failure of one address family does not necessarily fail the entire connection. I think, at the very least, the if-else-if in ppp_failed() https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/src/devices/wwan/nm-device-modem.c?id=ca26d1ae07a4e8a7036051301d0103ab0822fd44#n89 should be if (ip4_state_in_conf()) schedule_ip4_timeout() else if (ip4_state_done()) ip4_method_failed() /*BREAK*/ if (ip6_state_in_conf()) schedule_ip6_timeout() else if (ip6_state_done()) ip6_method_failed() But I think (and I think with this I agree with you), if ppp fails, then the entire ppp connection failed. Not just one address family. So, your device_state_change(STATE_FAILED) makes sense to me. best, Thomas
signature.asc
Description: This is a digitally signed message part
_______________________________________________ networkmanager-list mailing list [email protected] https://mail.gnome.org/mailman/listinfo/networkmanager-list
