On Tuesday, November 29, 2011 9:28:42 am Gleb Smirnoff wrote:
>   Daan,
> 
> On Tue, Nov 29, 2011 at 01:07:13AM +0100, Daan Vreeken wrote:
> D> Thanks for the looking into this and for your quick commit. I like your 
> twist
> D> on the patch with the move from the unit bitmap to allocating unit numbers
> D> with alloc_unr(9).
> D> 
> D> I do have two comments on the new code though.
> D> Before, in the !wildcard case, ifc_alloc_unit() would return ENOSPC when 
> the
> D> user requested a unit number larger than ifc->ifc_maxunit. Now the function
> D> returns EEXIST in that case because alloc_unr_specific() returns -1 both
> D> when a number is already allocated and when the number exceeds it's limits.
> D> This leads to the somewhat confusing:
> D> 
> D>         # ifconfig bridge123456 create
> D>         ifconfig: SIOCIFCREATE2: File exists
> D> 
> D> Apart from that I'm a bit worried about the "/* XXXGL: yep, it's a unit 
> leak
> D> */" part of your change. Once a unit number is leaked, there currently 
> seems
> D> to be no way to ever get it back. In a worst case scenario, where the 
> names of
> D> multiple interfaces in a row collides with the numbers alloc_unr() 
> returns, an
> D> entire row of unit numbers could be leaked during the creation of a single
> D> cloned interface.
> D> We have a product where cloned interfaces come and go very frequently. I'd
> D> hate to run out of unit numbers after 'just a few' years of uptime ;-)
> D> 
> D> I've created a new patch on top of your change that fixes both of these
> D> problems. Could you have a look at the attached diff?
> 
> Thanks, I will work on applying it.
> 
> D> > Considering the second part, that adds locking. Unfortunately, right now 
> we
> D> > have numerous races in the network configuration ocde. Many 
> SIOCSsomething
> D> > ioctls can race with each other producing unpredictable results and 
> kernel
> D> > panics. So, running two ifconfig(8) in parallel is a bad idea today. :(
> D> > Your patch with IFNET_NAMING_LOCK() just plumbs one race case: a race
> D> > between two SIOCSIFNAME ioctls. And it doesn't plumb a race between
> D> > SIOCSIFNAME vs SIOCIFCREATE, because IFNET_NAMING_LOCK() is dropped after
> D> > unit allocation, but prior to interface attachement to global interface
> D> > list.
> D> 
> D> Are you sure? With the patch in the PR, the relevant code in 
> D> ifc_simple_create() would become :
> D> 
> D>    ...
> D>    IFNET_NAMING_LOCK();
> D>    err = ifc_alloc_unit(ifc, &unit);
> D>    if (err != 0) {
> D>            IFNET_NAMING_UNLOCK();
> D>            return (err);
> D>    }
> D> 
> D>    err = ifcs->ifcs_create(ifc, unit, params);
> D>    IFNET_NAMING_UNLOCK();
> D>    if (err != 0) {
> D>            ifc_free_unit(ifc, unit);
> D>            return (err);
> D>    }
> D>    ...
> D> 
> D> The lock is acquired before the call to ifc_alloc_unit().
> D> In the non-error case (e.g. when creating an if_bridge interface) the call
> D> ends up calling bridge_clone_create(), which calls ether_ifattach(), which
> D> calls if_attach() -> if_attach_internal() where the ifp is added to the 
> ifnet
> D> list.
> D> Only when that's done, the lock is dropped.
> D> 
> D> Am I overlooking something obvious here?
> 
> The code above isn't correct since it holds mutex when calling 
> ifcs->ifcs_create().
> These methods may (they actually do) call malloc() with M_WAITOK.
> 
> In general FreeBSD uses M_WAITOK on the configuration code paths, like
> any SIOCSsomething ioctl. So to do correct protection, you first need to
> allocate every kind of memory needed, then lock mutex, then run through 
> configuration
> sequence, then release mutex and in case of error free all allocated memory.
> Sounds easy, but isn't in practice, especially when several network modules
> are involved.
> 
> So I'm still thinking about...
> 
> D> > From my point of view, we need a generic approach to ioctl() vs ioctl()
> D> > races, may be some global serializer of all re-configuration requests of
> D> > interfaces and addresses.
> 
> ... but several developers have noted that this approach may have some hidden
> problems.  When experimenting with new CARP, I have tried it on the 
> carp_ioctl()
> without any problems. The idea is simple:
> 
> static int serializer = 0;
> static struct mtx serializer_mtx;
> MTX_SYSINIT("sermtx", &serializer_mtx, "ioctl serializer mutex", MTX_DEF);
> 
> int
> foo_ioctl()
> {
>       mtx_lock(&serializer_mtx);
>       if (serializer != 0)
>               msleep(&serializer, &serializer_mtx, 0, "ioctl", 0);
>       serializer = 1;
>       mtx_unlock(&serializer_mtx);
> 
>       ... any code here, uncluding calls to different lower layers...
> 
>       mtx_lock(&serializer_mtx);
>       serializer = 0;
>       wakeup(&serializer);
>       mtx_unlock(&serializer_mtx);
> 
>       return (error);
> }
> 
> I have tried this for carp_ioctl() and it worked fine. You can try it for
> entire ifioctl() and all its descendants, but be aware of hidden problems :)

Hmm, is this much different in effect than:

static struct sx serializer_sx;
SX_SYSINIT(...);

int
foo_ioctl()
{
        sx_xlock(&serializer_sx);

        ... any code here

        sx_xunlock(&serializer_sx);
}

?

Using an sx lock is shorter and also allows WITNESS to catch possible cycles
that can be otherwise missed with home-rolled locks.

Also, you should really use 'while (serializer)', not if.  Currently this 
doesn't
really serialize since you can have this:

- three threads all try to run foo_ioctl()
- first thread doesn't block and sets serializer 1
- next two threads both block
- first thread finishes and does a wakeup
- both threads resume and run the 'any code here' bits in parallel
  (not serialized)

If that is not the desired behavior then you need to use a while().  Also, if
this really is a bug and not the desired behavior it's even more reason to use
existing primitives like sx(9) rather than trying to roll your own locks.

-- 
John Baldwin
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to