Our developer still thinks the race condition would be elsewhere. In both i40evf_reset_task and i40evf_open, i40evf_configure is being called and i40evf_configure calls i40evf_set_rx_mode. Then in 40evf_set_rx_mode, there is the while loop to check __I40EVF_IN_CRITICAL_TASK.
while (test_and_set_bit(__I40EVF_IN_CRITICAL_TASK, &adapter->crit_section)) dev_err(&adapter->pdev->dev, "Failed to get lock in %s\n", __func__); i40evf_up_complete is being called after the call of i40evf_configure. Todd Fujinaka Software Application Engineer Datacenter Engineering Group Intel Corporation todd.fujin...@intel.com (503) 712-4565 -----Original Message----- From: Codrut Grosu [mailto:cogr...@ixiacom.com] Sent: Thursday, August 3, 2017 7:12 AM To: Alexander Duyck <alexander.du...@gmail.com> Cc: e1000-devel@lists.sourceforge.net; Tudor Cornea <tcor...@ixiacom.com> Subject: Re: [E1000-devel] Possible bug in the i40evf driver Hi Alex, I apologize for my mistake and for being unclear. I meant "ndo_stop". You are right that they should be called with the RTNL lock held. And we were doing that. What is probably wrong is that we were not calling clear_bit(__LINK_STATE_START, &dev->state); This only happens if we call dev_close. If this bit is not cleared, netif_running will return true in the reset task. Then napi_disable will be called again. So it's not about the RTNL lock but about the __LINK_STATE_START bit. I agree that calling ndo_stop directly is probably wrong. And so far this is the only way we can make the driver hang. I still think that reset and open could race. As I said I can't reproduce this. But the second call to netif_running and napi_enable in the reset task is not protected by __I40EVF_IN_CRITICAL_TASK. And the call in i40evf_open to i40evf_up_complete is not protected either. Best wishes, Codrut -----Original Message----- From: Alexander Duyck [mailto:alexander.du...@gmail.com] Sent: Thursday, August 03, 2017 4:08 AM To: Codrut Grosu <cogr...@ixiacom.com> Cc: e1000-devel@lists.sourceforge.net Subject: Re: [E1000-devel] Possible bug in the i40evf driver On Tue, Aug 1, 2017 at 8:29 AM, Codrut Grosu <cogr...@ixiacom.com> wrote: > Hi, > > We believe there might be a bug in the i40evf driver. > > We think that there is race issue between i40evf_reset_task and i40evf_down > / i40evf_open. > The reason is that the functions napi_enable / napi_disable must be > called in pairs in order not to loop indefinitely (or crash). > > Consider the following: > > ifconfig eth1 mtu 1000 & ifconfig eth1 up > > What happens now is that the change of mtu schedules a reset. Suppose > the reset task starts, and the first call to netif_running (after > continue_reset) returns false. Before the thread reaches the second > call to netif_running, i40evf_open starts in another thread. Then the second > netif_running in reset will return true, and we will have 2 consecutive calls > of napi_enable. > > We could not reproduce this particular situation in practice (probably due to > the short timing). > However, we did hang the driver using a call to ndo_close() followed > quickly by "ethtool -G eth1 rx 4096". In this case netif_running will > return true always (as we bypassed the call to dev_close), the reset > will be scheduled before the interface finishes going down, and 2 calls to > napi_disable will happen. So this last statement isn't exactly clear. There isn't an ndo_close() in the kernel last I knew. There is an ndo_stop() and an i40evf_close(), but there isn't an ndo_close(). Can you clarify which one of these you were calling? I ask because ndo_stop() and i40evf_close() should only be called with the RTNL lock held. It sounds like you may not be doing that and that could cause a collsion with something like an "ethtool -G" command because the ethtool ioctl is taking the RTNL lock to avoid such collisions and if your call that is getting into i40evf_close() isn't holding the RTNL then that might explain the issue. - Alex ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired