Re: [PATCH] target: Fix NULL dereference during LUN lookup + active I/O shutdown

2017-03-01 Thread Nicholas A. Bellinger
On Thu, 2017-02-23 at 11:46 -0600, Bryant G. Ly wrote:
> > From: Nicholas Bellinger 
> >
> > When transport_clear_lun_ref() is shutting down a se_lun via
> > configfs with new I/O in-flight, it's possible to trigger a
> > NULL pointer dereference in transport_lookup_cmd_lun() due
> > to the fact percpu_ref_get() doesn't do any __PERCPU_REF_DEAD
> > checking before incrementing lun->lun_ref.count after
> > lun->lun_ref has switched to atomic_t mode.
> >
> > This results in a NULL pointer dereference as LUN shutdown
> > code in core_tpg_remove_lun() continues running after the
> > existing ->release() -> core_tpg_lun_ref_release() callback
> > completes, and clears the RCU protected se_lun->lun_se_dev
> > pointer.
> >
> > During the OOPs, the state of lun->lun_ref in the process
> > which triggered the NULL pointer dereference looks like
> > the following on v4.1.y stable code:
> >
> > struct se_lun {
> >lun_link_magic = 4294932337,
> >lun_status = TRANSPORT_LUN_STATUS_FREE,
> >
> >.
> >
> >lun_se_dev = 0x0,
> >lun_sep = 0x0,
> >
> >.
> >
> >lun_ref = {
> >  count = {
> >counter = 1
> >  },
> >  percpu_count_ptr = 3,
> >  release = 0xa02fa1e0 ,
> >  confirm_switch = 0x0,
> >  force_atomic = false,
> >  rcu = {
> >next = 0x88154fa1a5d0,
> >func = 0x8137c4c0 
> >  }
> >}
> > }
> >
> > To address this bug, use percpu_ref_tryget_live() to ensure
> > once __PERCPU_REF_DEAD is visable on all CPUs and ->lun_ref
> > has switched to atomic_t, all new I/Os will fail to obtain
> > a new lun->lun_ref reference.
> >
> > Also use an explicit percpu_ref_kill_and_confirm() callback
> > to block on ->lun_ref_comp to allow the first stage and
> > associated RCU grace period to complete, and then block on
> > ->lun_ref_shutdown waiting for the final percpu_ref_put()
> > to drop the last reference via transport_lun_remove_cmd()
> > before continuing with core_tpg_remove_lun() shutdown.
> >
> > Reported-by: Rob Millner 
> > Tested-by: Rob Millner 
> > Cc: Rob Millner 
> > Tested-by: Vaibhav Tandon 
> > Cc: Vaibhav Tandon 
> > Signed-off-by: Nicholas Bellinger 
> > ---
> >   drivers/target/target_core_device.c| 10 --
> >   drivers/target/target_core_tpg.c   |  3 ++-
> >   drivers/target/target_core_transport.c | 31 
> > ++-
> >   include/target/target_core_base.h  |  1 +
> >   4 files changed, 41 insertions(+), 4 deletions(-)
> >
> I have seen this and have tested this with our custom kernel.
> 
> So this looks good from me!
> 

Added your Tested-by to the patch.

Thanks Bryant.



Re: [PATCH] target: Fix NULL dereference during LUN lookup + active I/O shutdown

2017-02-23 Thread Bryant G. Ly



From: Nicholas Bellinger 

When transport_clear_lun_ref() is shutting down a se_lun via
configfs with new I/O in-flight, it's possible to trigger a
NULL pointer dereference in transport_lookup_cmd_lun() due
to the fact percpu_ref_get() doesn't do any __PERCPU_REF_DEAD
checking before incrementing lun->lun_ref.count after
lun->lun_ref has switched to atomic_t mode.

This results in a NULL pointer dereference as LUN shutdown
code in core_tpg_remove_lun() continues running after the
existing ->release() -> core_tpg_lun_ref_release() callback
completes, and clears the RCU protected se_lun->lun_se_dev
pointer.

During the OOPs, the state of lun->lun_ref in the process
which triggered the NULL pointer dereference looks like
the following on v4.1.y stable code:

struct se_lun {
   lun_link_magic = 4294932337,
   lun_status = TRANSPORT_LUN_STATUS_FREE,

   .

   lun_se_dev = 0x0,
   lun_sep = 0x0,

   .

   lun_ref = {
 count = {
   counter = 1
 },
 percpu_count_ptr = 3,
 release = 0xa02fa1e0 ,
 confirm_switch = 0x0,
 force_atomic = false,
 rcu = {
   next = 0x88154fa1a5d0,
   func = 0x8137c4c0 
 }
   }
}

To address this bug, use percpu_ref_tryget_live() to ensure
once __PERCPU_REF_DEAD is visable on all CPUs and ->lun_ref
has switched to atomic_t, all new I/Os will fail to obtain
a new lun->lun_ref reference.

Also use an explicit percpu_ref_kill_and_confirm() callback
to block on ->lun_ref_comp to allow the first stage and
associated RCU grace period to complete, and then block on
->lun_ref_shutdown waiting for the final percpu_ref_put()
to drop the last reference via transport_lun_remove_cmd()
before continuing with core_tpg_remove_lun() shutdown.

Reported-by: Rob Millner 
Tested-by: Rob Millner 
Cc: Rob Millner 
Tested-by: Vaibhav Tandon 
Cc: Vaibhav Tandon 
Signed-off-by: Nicholas Bellinger 
---
  drivers/target/target_core_device.c| 10 --
  drivers/target/target_core_tpg.c   |  3 ++-
  drivers/target/target_core_transport.c | 31 ++-
  include/target/target_core_base.h  |  1 +
  4 files changed, 41 insertions(+), 4 deletions(-)


I have seen this and have tested this with our custom kernel.

So this looks good from me!

-Bryant