On Tue, Jul 7, 2009 at 1:43 AM, Ian Lindsay<[email protected]> wrote:
> To clarify, can you give an exact procedure to reproduce?
> (E.g. an ftp transfer of a 100MB file from the internet to
> another box, routed through onboard ethernet on the Soekris)

I'll send you the files from /etc and an exact command sequence
tonight, but if you need to test it before then, something like this
should cause it:

1) bring up vr1 and vr2 and bridge them:
ifconfig vr1 inet 192.168.1.1 netmask 255.255.255.0
ifconfig vr2
ifconfig bridge0 create
brconfig bridge0 add vr1 add vr2
brconfig bridge0 up

2) bring up ral0 in hostap mode:
inet 192.168.2.1 255.255.255.0 NONE media autoselect \
                   mediaopt hostap nwid my_net chan 60

3) Connect a computer with a running ftp server to vr1 (192.168.1.2) (FTPHOST)
4) Connect another computer to vr2 (192.168.1.3) (CLIENT)
5) On CLIENT ftp into FTPHOST and get a large file (large enough that
will take about 5 minutes to grab).

This should hang the system.  If you repeat but leave out #2, it will
not.  Whether you bridge vr1 and vr2 or just route between them
doesn't seem to matter.

> I've been getting seemingly random occasional hangs with ral
> in HostAP mode that I haven't been able to correlate with
> any kind of traffic.

Well, although I can reproduce the problem, I can't honestly say I
know what is causing it, but I don't think it is traffic level alone.
E.g. What I did above consistently causes the hang.  If I use Samba
instead of FTP for the file transfer, I get the hang consistently, but
at a different point in the transfer.  If I bring up vr0 as a
connection to the Internet and run rtorrent on FTPHOST in upload only,
things don't break at all.

My current theory is that the hang is a function of both bandwidth and
time -- to get the hang the system needs to be pushing more than some
amount of data through it for some amount of time.

Unless someone comes up with a better idea, I'm going to grab iperf
and see if I can't nail down a specific situation (or even simplify
reproducing the problem), but I'm pretty much flying blind here.  I
don't know how OpenBSD's network code is put together, so anything I
do will basically be a random guess.  That's why I sent the email, I'm
hoping someone who knows more can suggest a series of tests to do that
will pin the bug down enough that one of the devs can fix it.

--MHC

Reply via email to