With the fcoe-next tree and the patch set I'm building, I hit this
BUG() at line 1862 in fcoe.c:
int fcoe_hostlist_remove(const struct fc_lport *lp)
{
struct fcoe_softc *fc;
write_lock_bh(&fcoe_hostlist_lock);
fc = fcoe_hostlist_lookup_softc(fcoe_netdev(lp));
BUG_ON(!fc);
This is called from fcoe_destroy(), and I guess two of those were
active at the same time as both 'fcoeadm -d' and 'rmmod' might be trying
to remove the instance.
I think the list locking should be simplified so that any create/delete/lookup
just uses the same lock, and holds it across a the entire operation.
I think there may be unapplied patches floating out there to fix this,
but it'd be really nice if they got applied.
Otherwise, my tests keep hitting this and every time we rebase I have
to come up with some temporary workaround.
Here's a test script, which I would like everyone to use to bang on
the create/delete/rmmod issues. It uses fcc but I commented that
out and you could use the equivalent fcoeadm command.
Note the fcoeadm commands are done in parallel for all specified nics.
Do this on a machine with at least 4 threads and a list of 2 nics or more nics.
#! /bin/bash
nics="eth0 eth4"
count=1000
bs=64k
iopass=true
while :
do
modprobe fcoe
for nic in $nics
do
fcoeadm -c $nic &
:
done
wait
sleep 10
# fcc # note could do fcoeadm -l or something here.
if $iopass
then
for disk in /dev/sd[b-z]
do
dd if=$disk of=/dev/null bs=$bs \
count=$count iflag=direct &
:
done
iopass=false
echo waiting for i/o bs $bs count $count
date
wait
date
sleep 5
else
iopass=true
sleep 10
fi
for nic in $nics
do
fcoeadm -d $nic &
:
done
while :
do
rmmod fcoe libfcoe libfc && break
sleep 1
done
sleep 10
done
Joe
_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel