On 09:20, Gerrit Renker wrote: > thank you for continued testing. I spent a long while yesterday evening > isolating > possible causes. Here are further pointers > > 1. Can you please tell us which kernel you are using?
I checked 2.6.19.x (which works), 2.6.20, 2.6.20.1 and 2.6.21-rc1
(which all have the bug). 2.6.21-rc2 wouldn't compile on my system
due to some apic isssue.
> If it is a very recent
> one, can you please check whether you get the following message in your
> syslog:
> "[...] listen_overflow!"
>
> If yes (dccp_debug should be turned on), then very likely setting
> listen(fd, 1)
> instead of listen(fd,0) may remove any strange effects.
> The reason is a recent change in sk_accept_queue_is_full which causes a
> different
> treatment of zero-sized listen-accept queues.
Will test this evening and report tomorrow.
> 2. With the most recent davem-2.6 kernel I was not able to reproduce this
> bug. It
> should, after some more thought, really make no difference whether you are
> using
> loopback (127.0.0.1) or not.
I can try this kernel as well. I'm currently downloading
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.git
Hope, that's the right one.
> 3. I analyzed the reverted patch you identified. There is indeed a loophole
> (which
> has not become visible so far), hence I will send
> (a) an update of this patch
> (b) a second patch to do with timer initialisation of child sockets.
> In particular (a) may help.
OK. Are these patches against the current linus-tree?
> 4. It may be worth trying a different application, e.g.
>
> http://www.erg.abdn.ac.uk/users/gerrit/dccp/apps/ttcp_dccp.tar.gz
> in order to find out which combination of system calls triggers the bug
> condition.
Will test and report.
> I managed to get the paraslash application built, but could not figure out
> how to
> populate the user lists and required configuration files.
That's explained in the INSTALL file. But you don't need any
configuration files and I _think_ you don't even need a paraslash
user to reproduce the bug, an empty ~/.paraslash/server.users should
do. Just start para_server with the autoplay (-a) option, i.e.
para_server -a --random_dir=/some/dir/containing/an/mp3/file
Then
para_recv -r dccp
triggers the bug.
> I don't understand your code fully yet, but with the more recent stack
> trace I
> was wondering whether this has to do with setting the listen socket
> non-blocking
> (mark_fd_nonblock), which is done both in sender and receiver.
IMHO it's considered good practice to set all fds which are used for
select() to non-blocking mode. AFAIR the reason is the situation
where a network packet arrives but is discarded because of a checksum
error. In this case it might happen that select() indicates readability
of an fd, but a subsequent read() blocks nevertheless. Maybe it's
unneccesssary to set an fd to non-blocking mode if it is only used
for writing. But it won't hurt either, so..
Thanks
Andre
--
The only person who always got his work done by Friday was Robinson Crusoe
signature.asc
Description: Digital signature

