On 01/25/2018 09:03 PM, Eric Dumazet wrote:
> On Thu, 2018-01-25 at 20:43 +0300, Alexey Kodanev wrote:
>> ccid2_hc_tx_rto_expire() timer callback always restarts the timer
>> again and can run indefinitely (unless it is stopped outside), and
>> after commit 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at
>> dismantle time"), which moved sk_stop_timer() to sk_destruct(),
>> this started to happen quite often. The timer prevents releasing
>> the socket, as a result, sk_destruct() won't be called.
>>
>> Found with LTP/dccp_ipsec tests running on the bonding device,
>> which later couldn't be unloaded after the tests were completed:
>>
>>   unregister_netdevice: waiting for bond0 to become free. Usage count = 148
>>
>> Fixes: 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at dismantle time")
>> Signed-off-by: Alexey Kodanev <[email protected]>
>> ---
> 
> I understand your fix, but not why commit 120e9dabaf55 is bug origin.
> 
> Looks like this always had been buggy : Timer logic should have checked
> socket state from day 0.

Hi Eric,

Agree, I'll change to the initial commit id. I've added commit 120e9dabaf55
because ccid_hc_tx_delete() (and sk_stop_timer()) moved from dccp_destroy_sock()
to sk_destruct(), and only after this change the chances that the timer won't
stop increased significantly.

> 
> I did not move sk_stop_timer() to sk_destruct()
> 

ccid_hc_tx_delete() includes sk_stop_timer():


ccid_hc_tx_delete()
    ccid2_hc_tx_exit(struct sock *sk)
        sk_stop_timer(sk, &hc->tx_rtotimer);

    ccid3_hc_tx_exit(struct sock *sk)
        sk_stop_timer(sk, &hc->tx_no_feedback_timer);


Thanks,
Alexey

Reply via email to