[cc list trimmed a bit]
On Tue, Jul 11, 2000 at 12:50:32PM +0200, Alexander Demenshin wrote:
> - Traffic generator used on _local_ interface:
>
> > A lot of fragmented packets:
>
> ifconfig lo mtu 256
> ping -f -s 8192 127.0.0.1
>
> > A lot of TCP traffic (connect/transfer/disconnect);
> > MTU does not matter.
>
> In my tests I used the following rules for iptables:
>
> iptables -t mangle -A PREROUTING -j QUEUE
> iptables -t mangle -A OUTPUT -j QUEUE
>
> I assume there are no other rules; but the problem occurs _only_
> when QUEUE target is in effect - other rules does not matter as long
> as there is no QUEUE targets or if packets are not accepted in userspace.
The only thing I can see in ipqueue is that it turns off local bottom halves
for a long time during packet receive. That could probably force other
races.
> In case if I use table 'filter' it also occurs (so nothing magical
> in 'mangle' table).
>
> So, once rules above are in effect, userspace module is running, and after
> certain period of time running traffic generator system lockup occurs
> (in my case - after processing of ca. 300K packets; but it depends -
> be patient :).
>
> No OOPs, no other kernel messages, _nothing_ except SysRq is active.
>
> Examining of code under EIP shows, that lockup occurs at:
>
> - In case of TCP traffic:
>
> src/net/ipv4/tcp_timer.c:690
>
> --- src/net/ipv4/tcp_timer.c:690 tcp_synack_timer() ---
> /* Drop this request */
> write_lock(&tp->syn_wait_lock); /* <<<
>AT THIS PLACE */
This one is strange. Any chance to get a multi CPU backtrace for this ?
(install kdb from oss.sgi.com:/projects/kdb/ , press pause during a hang,
enter bt and switch to the other CPUs using the cpu command and backtrace
them too)
> *reqp = req->dl_next;
> write_unlock(&tp->syn_wait_lock);
>
> --- CUT ---
>
> - In case of ICMP (fragmented) traffic:
>
> --- src/net/ipv4/ip_fragment:202 ip_expire ---
> spin_lock(&ipfrag_lock); /* <<< AT THIS
>PLACE */
The fragment locking is known to be buggy. It should be fixed in 2.4.0pre3.
Also there was a NAT bug that it called ip_defrag without bhs turned off
that could cause deadlocks too, but that should be already fixed
(all ip_defrag calls in netfilter/* should be guarded by a local_bh_disable/
enable)
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]