On 24 Jul 2011, at 04:51, Ryan Stone wrote:

> I ran headlong into this issue today when trying to test some network
> stack changes.  It's depressingly easy to crash HEAD by periodically
> destroying vlan interfaces while you are sending traffic over those
> interfaces -- I get a panic within minutes.
> 
>> http://people.freebsd.org/~glebius/patches/ifnet.no_free
> 
> This patch makes my test system survive longer but does not resolve the issue.

Unfortunately, I'm a bit preoccupied currently, so haven't had a chance to 
follow up as yet, but just to follow up on the general issue here: this problem 
existed pre-SMP as well, and could be easily triggered by DUMMYNET and 
removable interfaces as well (as additional queuing delays just make the 
problem worse). In general: we need a solution that penalises interface 
removal, not common-case packet processing. As many packets have their source 
ifnet looked up in common-case processing (worth checking this assumption) 
because it's cheap, any solution that causes an interface lookup on every input 
packet (with synchronisation) is also an issue.

Instead, I think we should go for a more radical notion, which is a bit harder 
to implement in our stack: the network stack needs a race-free way to "drain" 
all mbufs referring to a particular ifnet, which does not cause existing 
processing to become more expensive. This is easy in some subsystems, but more 
complex in others -- and the composition of subsystems makes it all much harder 
since we need to know that (to be 100% correct) packets aren't getting passed 
between subsystems (and hence belong to neither) in a way that races with a 
sweep through the subsystems. It may be possible to get this 99.9% right simply 
by providing a series of callbacks into subsystems that cause queues to be 
walked and drained of packets matching the doomed ifnet. It may also be quite 
cheap to have subsystems that "hold" packets outside of explicit queues for 
some period (i.e., in a thread-local pointer out of the stack) add explicit 
invalidation tests  (i.e., for IFF_DYING) before handing off to prevent those 
packets from traversing into other subsystems -- which can be done 
synchronisation-free, but still wouldn't 100% prevent the race

Just to give an example: netisr should offer a method for 
netisr_drain_ifnet(struct ifnet *) that causes netisr to walk all of its queues 
to find matching packets and free them. Due to direct dispatch and thread-local 
queues during processing, netisr should also check IFF_DYING before handing off.

If we do that, I wonder how robust the system then becomes...? This may not be 
too hard to test. But I'd rather we penalise ifnet removal than, say, the IP 
input path when it needs to check a source interface property.

Robert_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[email protected]"

Reply via email to