Our developer still thinks the race condition would be elsewhere.

In both i40evf_reset_task and i40evf_open, i40evf_configure is being called and 
i40evf_configure calls i40evf_set_rx_mode.
Then in 40evf_set_rx_mode, there is the while loop to check 
__I40EVF_IN_CRITICAL_TASK. 

                while (test_and_set_bit(__I40EVF_IN_CRITICAL_TASK, 
&adapter->crit_section))
                       dev_err(&adapter->pdev->dev, "Failed to get lock in 
%s\n", __func__);

i40evf_up_complete is being called after the call of i40evf_configure. 

Todd Fujinaka
Software Application Engineer
Datacenter Engineering Group
Intel Corporation
todd.fujin...@intel.com
(503) 712-4565


-----Original Message-----
From: Codrut Grosu [mailto:cogr...@ixiacom.com] 
Sent: Thursday, August 3, 2017 7:12 AM
To: Alexander Duyck <alexander.du...@gmail.com>
Cc: e1000-devel@lists.sourceforge.net; Tudor Cornea <tcor...@ixiacom.com>
Subject: Re: [E1000-devel] Possible bug in the i40evf driver

  Hi Alex,

  I apologize for my mistake and for being unclear. I meant "ndo_stop".

  You are right that they should be called with the RTNL lock held. And we were 
doing that. What is probably wrong is that we were not calling
        clear_bit(__LINK_STATE_START, &dev->state); This only happens if we 
call dev_close.
If this bit is not cleared, netif_running will return true in the reset task. 
Then napi_disable will be called again. So it's not about the RTNL lock but 
about the __LINK_STATE_START bit.

  I agree that calling ndo_stop directly is probably wrong. And so far this is 
the only way we can make the driver hang.

 I still think that reset and open could race. As I said I can't reproduce 
this. But the second call to netif_running and napi_enable in the reset task is 
not protected by __I40EVF_IN_CRITICAL_TASK. And the call in i40evf_open to 
i40evf_up_complete is not protected either.

  Best wishes,
    Codrut 

-----Original Message-----
From: Alexander Duyck [mailto:alexander.du...@gmail.com]
Sent: Thursday, August 03, 2017 4:08 AM
To: Codrut Grosu <cogr...@ixiacom.com>
Cc: e1000-devel@lists.sourceforge.net
Subject: Re: [E1000-devel] Possible bug in the i40evf driver

On Tue, Aug 1, 2017 at 8:29 AM, Codrut Grosu <cogr...@ixiacom.com> wrote:
>    Hi,
>
>   We believe there might be a bug in the i40evf driver.
>
>   We think that there is race issue between i40evf_reset_task and i40evf_down 
> / i40evf_open.
> The reason is that the functions napi_enable / napi_disable must be 
> called in pairs in order not to loop indefinitely (or crash).
>
> Consider the following:
>
> ifconfig eth1 mtu 1000 & ifconfig eth1 up
>
> What happens now is that the change of mtu schedules a reset. Suppose 
> the reset task starts, and the first call to netif_running (after
> continue_reset) returns false. Before the thread reaches the second 
> call to netif_running, i40evf_open starts in another thread. Then the second 
> netif_running in reset will return true, and we will have 2 consecutive calls 
> of napi_enable.
>
> We could not reproduce this particular situation in practice (probably due to 
> the short timing).
> However, we did hang the driver using a call to ndo_close() followed 
> quickly by "ethtool -G eth1 rx 4096". In this case netif_running will 
> return true always (as we bypassed the call to dev_close), the reset 
> will be scheduled before the interface finishes going down, and 2 calls to 
> napi_disable will happen.

So this last statement isn't exactly clear. There isn't an ndo_close() in the 
kernel last I knew. There is an ndo_stop() and an i40evf_close(), but there 
isn't an ndo_close(). Can you clarify which one of these you were calling?

I ask because ndo_stop() and i40evf_close() should only be called with the RTNL 
lock held. It sounds like you may not be doing that and that could cause a 
collsion with something like an "ethtool -G" command because the ethtool ioctl 
is taking the RTNL lock to avoid such collisions and if your call that is 
getting into i40evf_close() isn't holding the RTNL then that might explain the 
issue.

- Alex
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most engaging tech 
sites, Slashdot.org! http://sdm.link/slashdot 
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to