Re: Detailed report on SMB-build lockups [seems that it is locking problem in networking code] (2.4.0-test2-ac2 and later)

Andi Kleen Tue, 11 Jul 2000 04:31:28 -0700
[cc list trimmed a bit]

On Tue, Jul 11, 2000 at 12:50:32PM +0200, Alexander Demenshin wrote:
>               - Traffic generator used on _local_ interface:
>               
>                       > A lot of fragmented packets:
>                       
>                               ifconfig lo mtu 256
>                               ping -f -s 8192 127.0.0.1
>                               
>                       > A lot of TCP traffic (connect/transfer/disconnect);
>                       > MTU does not matter.
>                       
>       In my tests I used the following rules for iptables:
>       
>               iptables -t mangle -A PREROUTING -j QUEUE
>               iptables -t mangle -A OUTPUT     -j QUEUE
>               
>       I assume there are no other rules; but the problem occurs _only_
>       when QUEUE target is in effect - other rules does not matter as long
>       as there is no QUEUE targets or if packets are not accepted in userspace.

The only thing I can see in ipqueue is that it turns off local bottom halves
for a long time during packet receive. That could probably force other
races.

>       In case if I use table 'filter' it also occurs (so nothing magical
>       in 'mangle' table).
>       
>       So, once rules above are in effect, userspace module is running, and after
>       certain period of time running traffic generator system lockup occurs
>       (in my case - after processing of ca. 300K packets; but it depends - 
>       be patient :).
>       
>       No OOPs, no other kernel messages, _nothing_ except SysRq is active.
>       
>       Examining of code under EIP shows, that lockup occurs at:
>       
>               - In case of TCP traffic:
>               
>                       src/net/ipv4/tcp_timer.c:690
>                       
> --- src/net/ipv4/tcp_timer.c:690 tcp_synack_timer() ---
>                                 /* Drop this request */
>                                 write_lock(&tp->syn_wait_lock);               /* <<< 
>AT THIS PLACE */

This one is strange. Any chance to get a multi CPU backtrace for this  ?
(install kdb from oss.sgi.com:/projects/kdb/ , press pause during a hang,
enter bt and switch to the other CPUs using the cpu command and backtrace
them too) 


>                                 *reqp = req->dl_next;
>                                 write_unlock(&tp->syn_wait_lock);
> 
> --- CUT ---
> 
>               - In case of ICMP (fragmented) traffic:
>               
> --- src/net/ipv4/ip_fragment:202 ip_expire ---
>         spin_lock(&ipfrag_lock);                                      /* <<< AT THIS 
>PLACE */

The fragment locking is known to be buggy. It should be fixed in 2.4.0pre3.
Also there was a NAT bug that it called ip_defrag without bhs turned off
that could cause deadlocks too, but that should be already fixed
(all ip_defrag calls in netfilter/* should be guarded by a local_bh_disable/
enable) 


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
Re: Detailed report on SMB-build lockups [seems that it is locking problem in networking code] (2.4.0-test2-ac2 and later)

Reply via email to