> > On 10/21/2011 4:36 PM, Neil Horman wrote: > > This oops was reported to me recently: > > PID: 5176 TASK: ffff880215274100 CPU: 0 COMMAND: "fc_rport_eq" > > 0 [ffff880218c65760] machine_kexec at ffffffff81031d3b > > 1 [ffff880218c657c0] crash_kexec at ffffffff810b8e92 > > 2 [ffff880218c65890] oops_end at ffffffff814ef890 > > 3 [ffff880218c658c0] no_context at ffffffff8104226b > > 4 [ffff880218c65910] __bad_area_nosemaphore at ffffffff810424f5 > > 5 [ffff880218c65960] bad_area_nosemaphore at ffffffff810425c3 > > 6 [ffff880218c65970] __do_page_fault at ffffffff81042c9d > > 7 [ffff880218c65a90] do_page_fault at ffffffff814f186e > > 8 [ffff880218c65ac0] page_fault at ffffffff814eec25 > > 9 [ffff880218c65bb8] fc_fcp_complete_locked at ffffffffa02ed739 [libfc] > > 10 [ffff880218c65c08] fc_fcp_retry_cmd at ffffffffa02ed86f [libfc] > > 11 [ffff880218c65c28] fc_fcp_recv at ffffffffa02eed3f [libfc] > > 12 [ffff880218c65d28] fc_exch_mgr_reset at ffffffffa02e2373 [libfc] > > 13 [ffff880218c65db8] fc_rport_work at ffffffffa02e9f10 [libfc] > > 14 [ffff880218c65e38] worker_thread at ffffffff8108b250 > > 15 [ffff880218c65ee8] kthread at ffffffff81090806 > > 16 [ffff880218c65f48] kernel_thread at ffffffff8100c10a > > > > It results from two contexts that try to manipulate the same > fcoe_exch_pool > > without syncronizing themselves: > > > > 1) The fcoe event_work workqueue which calls > > fc_rport_work > > fc_exch_mgr_reset > > fc_exch_pool_reset > > > > 2) The FCOE transport destroy path, which schedules a destroy_work > workqueue, > > calling: > > fcoe_destroy_work > > fcoe_if_destroy > > fc_exch_mgr_free > > fc_exch_mgr_del > > fc_exch_mgr_destroy > > > > The pool_reset path holds the pool look, but no references to the pool > manager > > kobject, while exch_mgr_destroy path drops what is ostensibly the last > > reference to the pool manager kobject (causing its freeing), while not > holding > > the pool lock. > > > > The attached patch has been confirmed to prevent the panic. > > > > Signed-off-by: Neil Horman<nhor...@tuxdriver.com> > > CC: Robert Love<robert.w.l...@intel.com> > > --- > > drivers/scsi/libfc/fc_exch.c | 2 ++ > > 1 files changed, 2 insertions(+), 0 deletions(-) > > > > diff --git a/drivers/scsi/libfc/fc_exch.c > b/drivers/scsi/libfc/fc_exch.c > > index 8325489..d7c8100 100644 > > --- a/drivers/scsi/libfc/fc_exch.c > > +++ b/drivers/scsi/libfc/fc_exch.c > > @@ -1815,10 +1815,12 @@ void fc_exch_mgr_reset(struct fc_lport *lport, > u32 sid, u32 did) > > unsigned int cpu; > > > > list_for_each_entry(ema,&lport->ema_list, ema_list) { > > + kref_get(&ema->mp->kref) > > If the above contexts are not synchronized, we may be accessing illegal > pointer in ema. Just before calling kref_get(), the ema may have been > free'd in fc_exch_mgr_del(). We do not still have protection for it, > right? am I missing something? > Are we not holding lport lock when doing fc_exch_mgr_free() to guard the ema_list? I thought we are, I might be wrong.
yi > > > for_each_possible_cpu(cpu) > > fc_exch_pool_reset(lport, > > per_cpu_ptr(ema->mp->pool, cpu), > > sid, did); > > + kref_put(&ema->mp->kref, fc_exch_mgr_destroy); > > } > > } > > EXPORT_SYMBOL(fc_exch_mgr_reset); > > > _______________________________________________ > devel mailing list > devel@open-fcoe.org > https://lists.open-fcoe.org/mailman/listinfo/devel _______________________________________________ devel mailing list devel@open-fcoe.org https://lists.open-fcoe.org/mailman/listinfo/devel