On 08.12.2016 23:18, Pavel Machek wrote:
> On Thu 2016-12-08 23:12:10, Lino Sanfilippo wrote:
>> Hi,
>> 
>> On 08.12.2016 22:54, Pavel Machek wrote:
>> > On Thu 2016-12-08 21:32:12, Lino Sanfilippo wrote:
>> >> Hi,
>> >> 
>> >> On 08.12.2016 00:15, Francois Romieu wrote:
>> >> > Lino Sanfilippo <linosanfili...@gmx.de> :
>> >> >> The driver uses a private lock for synchronization between the xmit
>> >> >> function and the xmit completion handler, but since the NETIF_F_LLTX 
>> >> >> flag
>> >> >> is not set, the xmit function is also called with the xmit_lock held.
>> >> >> 
>> >> >> On the other hand the xmit completion handler first takes the private 
>> >> >> lock
>> >> >> and (in case that the tx queue has been stopped) the xmit_lock, leading
>> >> >> to a reverse locking order and the potential danger of a deadlock.
>> >> > 
>> >> > netif_tx_stop_queue is used by:
>> >> > 1. xmit function before releasing lock and returning.
>> >> > 2. sxgbe_restart_tx_queue()
>> >> >    <- sxgbe_tx_interrupt
>> >> >    <- sxgbe_reset_all_tx_queues()
>> >> >       <- sxgbe_tx_timeout()
>> >> > 
>> >> > Given xmit won't be called again until tx queue is enabled, it's not 
>> >> > clear
>> >> > how a deadlock could happen due to #1.
>> >> > 
>> >> 
>> >> 
>> >> After spending more thoughts on this I tend to agree with you. Yes, we 
>> >> have the
>> >> different locking order for the xmit_lock and the private lock in two 
>> >> concurrent
>> >> threads. And one of the first things one learns about locking is that 
>> >> this is a
>> >> good way to create a deadlock sooner or later. But in our case the 
>> >> deadlock 
>> >> can only occur if the xmit function and the tx completion handler 
>> >> perceive different
>> >>  states for the tx queue, or to be more specific: 
>> >> the completion handler sees the tx queue in state "stopped" while the 
>> >> xmit handler 
>> >> sees it in state "running" at the same time. Only then both functions 
>> >> would try to
>> >> take both locks, which could lead to a deadlock.
>> >> 
>> >> OTOH Pavel said that he actually could produce a deadlock. Now I wonder 
>> >> if this is caused
>> >> by that locking scheme (in a way I have not figured out yet) or if it is 
>> >> a different issue.
>> > 
>> > Pavel has some problems, but that's on different hardware.. and it is
>> > possible that it is deadlock (or something else) somewhere else.
>> > 
>> 
>> Right, it is different hardware. But the locking situation in xmit function 
>> and tx completion handler
>> is very similar in both drivers. So if a deadlock is not possible in sxgbe 
>> it should 
>> also not be possible in stmmac (at least not due to the different locking 
>> order). 
>> So maybe there is no real issue that we could fix with removing the private 
>> lock and we should
>> keep it as it is.
> 
> Well.. the locking is pretty confused there. Having private lock that
> mirrors lock from network layer is confusing and ugly... that should
> be reason to fix it.
>                                                                       Pavel
> 

Ok. Then I will resend the patches for both drivers with a different (less 
dramatic) commit message
in which the change is not longer described as a fix for deadlock but rather as 
a code 
cleanup/improvement, ok?

Regards,
Lino

Reply via email to