On 09:20, Gerrit Renker wrote:

> thank you for continued testing. I spent a long while yesterday evening 
> isolating
> possible causes. Here are further pointers
> 
> 1. Can you please tell us which kernel you are using?

I checked 2.6.19.x (which works), 2.6.20, 2.6.20.1 and 2.6.21-rc1
(which all have the bug). 2.6.21-rc2 wouldn't compile on my system
due to some apic isssue.

>  If it is a very recent
>    one, can you please check whether you get the following message in your
>    syslog:
>                 "[...] listen_overflow!"
> 
>    If yes (dccp_debug should be turned on), then very likely setting 
> listen(fd, 1)
>    instead of listen(fd,0) may remove any strange effects. 
>    The reason is a recent change in sk_accept_queue_is_full which causes a 
> different
>    treatment of zero-sized listen-accept queues.

Will test this evening and report tomorrow.

> 2. With the most recent davem-2.6 kernel I was not able to reproduce this 
> bug. It 
>    should, after some more thought, really make no difference whether you are 
> using
>    loopback (127.0.0.1) or not.

I can try this kernel as well. I'm currently downloading

        git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.git
        
Hope, that's the right one.

> 3. I analyzed the reverted patch you identified. There is indeed a loophole 
> (which
>    has not become visible so far), hence I will send 
>    (a) an update of this patch 
>    (b) a second patch to do with timer initialisation of child sockets.
>    In particular (a) may help. 

OK. Are these patches against the current linus-tree?

> 4. It may be worth trying a different application, e.g. 
> 
>        http://www.erg.abdn.ac.uk/users/gerrit/dccp/apps/ttcp_dccp.tar.gz     
>    in order to find out which combination of system calls triggers the bug 
> condition.

Will test and report.


>    I managed to get the paraslash application built, but could not figure out 
> how to
>    populate the user lists and required configuration files. 

That's explained in the INSTALL file. But you don't need any
configuration files and I _think_ you don't even need a paraslash
user to reproduce the bug, an empty ~/.paraslash/server.users should
do. Just start para_server with the autoplay (-a) option, i.e.

        para_server -a --random_dir=/some/dir/containing/an/mp3/file

Then
        para_recv -r dccp

triggers the bug.

>    I don't understand your code fully yet, but with the more recent stack 
> trace I
>    was wondering whether this has to do with setting the listen socket 
> non-blocking
>    (mark_fd_nonblock), which is done both in sender and receiver.

IMHO it's considered good practice to set all fds which are used for
select() to non-blocking mode.  AFAIR the reason is the situation
where a network packet arrives but is discarded because of a checksum
error. In this case it might happen that select() indicates readability
of an fd, but a subsequent read() blocks nevertheless. Maybe it's
unneccesssary to set an fd to non-blocking mode if it is only used
for writing. But it won't hurt either, so..

Thanks
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

Attachment: signature.asc
Description: Digital signature

Reply via email to