> 
> On 10/21/2011 4:36 PM, Neil Horman wrote:
> > This oops was reported to me recently:
> > PID: 5176   TASK: ffff880215274100  CPU: 0   COMMAND: "fc_rport_eq"
> > 0 [ffff880218c65760] machine_kexec at ffffffff81031d3b
> > 1 [ffff880218c657c0] crash_kexec at ffffffff810b8e92
> > 2 [ffff880218c65890] oops_end at ffffffff814ef890
> > 3 [ffff880218c658c0] no_context at ffffffff8104226b
> > 4 [ffff880218c65910] __bad_area_nosemaphore at ffffffff810424f5
> > 5 [ffff880218c65960] bad_area_nosemaphore at ffffffff810425c3
> > 6 [ffff880218c65970] __do_page_fault at ffffffff81042c9d
> > 7 [ffff880218c65a90] do_page_fault at ffffffff814f186e
> > 8 [ffff880218c65ac0] page_fault at ffffffff814eec25
> > 9 [ffff880218c65bb8] fc_fcp_complete_locked at ffffffffa02ed739 [libfc]
> > 10 [ffff880218c65c08] fc_fcp_retry_cmd at ffffffffa02ed86f [libfc]
> > 11 [ffff880218c65c28] fc_fcp_recv at ffffffffa02eed3f [libfc]
> > 12 [ffff880218c65d28] fc_exch_mgr_reset at ffffffffa02e2373 [libfc]
> > 13 [ffff880218c65db8] fc_rport_work at ffffffffa02e9f10 [libfc]
> > 14 [ffff880218c65e38] worker_thread at ffffffff8108b250
> > 15 [ffff880218c65ee8] kthread at ffffffff81090806
> > 16 [ffff880218c65f48] kernel_thread at ffffffff8100c10a
> >
> > It results from two contexts that try to manipulate the same
> fcoe_exch_pool
> > without syncronizing themselves:
> >
> > 1) The fcoe event_work workqueue which calls
> > fc_rport_work
> >   fc_exch_mgr_reset
> >     fc_exch_pool_reset
> >
> > 2) The FCOE transport destroy path, which schedules a destroy_work
> workqueue,
> > calling:
> > fcoe_destroy_work
> >   fcoe_if_destroy
> >    fc_exch_mgr_free
> >     fc_exch_mgr_del
> >      fc_exch_mgr_destroy
> >
> > The pool_reset path holds the pool look, but no references to the pool
> manager
> > kobject, while exch_mgr_destroy path drops what is ostensibly the last
> > reference to the pool manager kobject (causing its freeing), while not
> holding
> > the pool lock.
> >
> > The attached patch has been confirmed to prevent the panic.
> >
> > Signed-off-by: Neil Horman<nhor...@tuxdriver.com>
> > CC: Robert Love<robert.w.l...@intel.com>
> > ---
> >   drivers/scsi/libfc/fc_exch.c |    2 ++
> >   1 files changed, 2 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/scsi/libfc/fc_exch.c
> b/drivers/scsi/libfc/fc_exch.c
> > index 8325489..d7c8100 100644
> > --- a/drivers/scsi/libfc/fc_exch.c
> > +++ b/drivers/scsi/libfc/fc_exch.c
> > @@ -1815,10 +1815,12 @@ void fc_exch_mgr_reset(struct fc_lport *lport,
> u32 sid, u32 did)
> >     unsigned int cpu;
> >
> >     list_for_each_entry(ema,&lport->ema_list, ema_list) {
> > +           kref_get(&ema->mp->kref)
> 
> If the above contexts are not synchronized, we may be accessing illegal
> pointer in ema.  Just before calling kref_get(), the ema may have been
> free'd in fc_exch_mgr_del(). We do not still have protection for it,
> right? am I missing something?
> 
Are we not holding lport lock when doing fc_exch_mgr_free() to guard the 
ema_list?
I thought we are, I might be wrong.

yi

> 
> >             for_each_possible_cpu(cpu)
> >                     fc_exch_pool_reset(lport,
> >                                        per_cpu_ptr(ema->mp->pool, cpu),
> >                                        sid, did);
> > +           kref_put(&ema->mp->kref, fc_exch_mgr_destroy);
> >     }
> >   }
> >   EXPORT_SYMBOL(fc_exch_mgr_reset);
> 
> 
> _______________________________________________
> devel mailing list
> devel@open-fcoe.org
> https://lists.open-fcoe.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@open-fcoe.org
https://lists.open-fcoe.org/mailman/listinfo/devel

Reply via email to