Thanks! - George
On Mon, Jun 13, 2016 at 11:07 AM, Josef 'Jeff' Sipek <jef...@josefsipek.net> wrote: > On Mon, Jun 13, 2016 at 11:00:44 -0400, George Wilson wrote: > > Is the use-after-free issue in openzfs or just Nexenta's gate? Just want > to > > make sure we have a bug filed if it's in openzfs. > > Just Nexenta's gate. E.g., for the l2arc case it looks like: > > } else if (spa->spa_l2cache.sav_vdevs != NULL && > nvlist_lookup_nvlist_array(spa->spa_l2cache.sav_config, > ZPOOL_CONFIG_L2CACHE, &l2cache, &nl2cache) == 0 && > (nv = spa_nvlist_lookup_by_guid(l2cache, nl2cache, guid)) != > NULL) { > /* > * Cache devices can always be removed. > */ > if (vd == NULL) > vd = spa_lookup_by_guid(spa, guid, B_TRUE); > > <Nexenta code deref'ing vd ommitted for brevity> > > spa_vdev_remove_aux(spa->spa_l2cache.sav_config, > ZPOOL_CONFIG_L2CACHE, l2cache, nl2cache, nv); > spa_load_l2cache(spa); > spa->spa_l2cache.sav_sync = B_TRUE; > spa_event_notify(spa, vd, ESC_ZFS_VDEV_REMOVE_AUX); > } else if (vd != NULL && vd->vdev_islog) { > > The problem here is that spa_load_l2cache() can free the vd that we looked > up - > but we end up passing it to the spa_event_notify() call just the same. > > We're fixing it in our gate, but I want to clean it up a bit in openzfs to > avoid this recurring/other bad things happening. The spare case it a bit > more > complicated, but the same basic idea applies. (On top of the event getting > generated even when EBUSY.) > > Jeff. > > > Thanks, > > George > > > > On Mon, Jun 13, 2016 at 10:58 AM, Josef 'Jeff' Sipek < > jef...@josefsipek.net> > > wrote: > > > > > On Mon, Jun 13, 2016 at 10:48:14 -0400, George Wilson wrote: > > > > Jeff, > > > > > > > > Seems reasonable to me. If we wanted to provide more information > about > > > the > > > > vdev being removed then you could make a call to > spa_lookup_by_guid(spa, > > > > guid, B_TRUE) just prior to the call to spa_vdev_remove_aux(). You > would > > > > only do this is vd == NULL but that could be another way to enhance > this > > > > logic. > > > > > > Amusingly enough, the reason I ended up looking at this is exactly > because > > > we have that kind of lookup in Nexenta's gate. It turns out that > using the > > > vd looked up with aux == B_TRUE leads to a use-after-free. :) Yeah, I > > > could > > > try to stash things elsewhere (e.g., make a copy of the vd), but IMO > it's > > > not really worth it. > > > > > > Jeff. > > > > > > > > > > > Thanks, > > > > George > > > > > > > > On Fri, Jun 10, 2016 at 5:14 PM, Alan Somers <asom...@freebsd.org> > > > wrote: > > > > > > > > > On Fri, Jun 10, 2016 at 2:53 PM, Josef 'Jeff' Sipek > > > > > <jef...@josefsipek.net> wrote: > > > > > > I've been looking at the code that 6922 introduced and to me it > looks > > > > > like > > > > > > the change is too complicated/misleading and slightly buggy. > > > > > > > > > > > > l2cache case > > > > > > ------------ > > > > > > > > > > > > At the beginning of the function, we try to convert a guid to a > > > vdev_t > > > > > > pointer by calling spa_lookup_by_guid with aux == B_FALSE. It > is my > > > > > > understanding that this will always result in vd == NULL if the > guid > > > is > > > > > for > > > > > > an L2ARC device. That means that when we make it into the > sav_vdevs > > > != > > > > > NULL > > > > > > else-if, the sole user of vd will always be NULL. > > > > > > > > > > > > This brings up the next question. How useful are these events > since > > > > > they don't > > > > > > include the removed vdev's guid and path? > > > > > > > > > > > > spares case > > > > > > ----------- > > > > > > > > > > > > This is similar to the l2cache case, but a bit more complicated. > > > Unlike > > > > > the > > > > > > l2cache case, this case can fail with EBUSY. However, in the > event > > > of a > > > > > > failure, the code still generates the event. This is easy to > fix by > > > > > moving > > > > > > the event generation higher up. > > > > > > > > > > > > Here, we certainly want to use vd, since vd can be either NULL or > > > > > non-NULL. > > > > > > It is NULL (and unspare is 0) in the case of 'zpool remove' of an > > > unused > > > > > > spare device, but non-NULL (and unspare is 1) in the case of > 'zpool > > > > > detach' > > > > > > of a removed device that a spare took over from. > > > > > > > > > > > > > > > > > > So, with all this said, unless I'm wrong about something, I'd > like > > > review > > > > > > for the combined change: > > > > > > > > > > > > diff --git a/usr/src/uts/common/fs/zfs/spa.c > > > > > b/usr/src/uts/common/fs/zfs/spa.c > > > > > > index 07abe37..0265887 100644 > > > > > > --- a/usr/src/uts/common/fs/zfs/spa.c > > > > > > +++ b/usr/src/uts/common/fs/zfs/spa.c > > > > > > @@ -5473,10 +5473,10 @@ spa_vdev_remove(spa_t *spa, uint64_t > guid, > > > > > boolean_t unspare) > > > > > > ZPOOL_CONFIG_SPARES, spares, nspares, > > > nv); > > > > > > spa_load_spares(spa); > > > > > > spa->spa_spares.sav_sync = B_TRUE; > > > > > > + spa_event_notify(spa, vd, > > > > > ESC_ZFS_VDEV_REMOVE_AUX); > > > > > > } else { > > > > > > error = SET_ERROR(EBUSY); > > > > > > } > > > > > > - spa_event_notify(spa, vd, > ESC_ZFS_VDEV_REMOVE_AUX); > > > > > > } else if (spa->spa_l2cache.sav_vdevs != NULL && > > > > > > > nvlist_lookup_nvlist_array(spa->spa_l2cache.sav_config, > > > > > > ZPOOL_CONFIG_L2CACHE, &l2cache, &nl2cache) == 0 && > > > > > > @@ -5488,7 +5488,7 @@ spa_vdev_remove(spa_t *spa, uint64_t guid, > > > > > boolean_t unspare) > > > > > > ZPOOL_CONFIG_L2CACHE, l2cache, nl2cache, nv); > > > > > > spa_load_l2cache(spa); > > > > > > spa->spa_l2cache.sav_sync = B_TRUE; > > > > > > - spa_event_notify(spa, vd, > ESC_ZFS_VDEV_REMOVE_AUX); > > > > > > + spa_event_notify(spa, NULL, > ESC_ZFS_VDEV_REMOVE_AUX); > > > > > > } else if (vd != NULL && vd->vdev_islog) { > > > > > > ASSERT(!locked); > > > > > > ASSERT(vd == vd->vdev_top); > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jeff. > > > > > > > > > > > > -- > > > > > > I already backed up the [server] once, I can do it again. > > > > > > - a sysadmin threatening to do more frequent > backups > > > > > > > > > > From inspection your proposal looks good to me, but I won't have > time > > > > > to test it until next week. > > > > > > > > > > -Alan > > > > > > > > > > > -- > > > Bad pun of the week: The formula 1 control computer suffered from a > race > > > condition > > > > > -- > mainframe, n.: > An obsolete device still used by thousands of obsolete companies serving > billions of obsolete customers and making huge obsolete profits for their > obsolete shareholders. And this year's run twice as fast as last year's. > ------------------------------------------- openzfs-developer Archives: https://www.listbox.com/member/archive/274414/=now RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa Modify Your Subscription: https://www.listbox.com/member/?member_id=28015062&id_secret=28015062-f966d51c Powered by Listbox: http://www.listbox.com