On Mon, Oct 24, 2011 at 11:59:52AM -0700, Zou, Yi wrote:
>
> >
> > On 10/21/2011 4:36 PM, Neil Horman wrote:
> > > This oops was reported to me recently:
> > > PID: 5176 TASK: ffff880215274100 CPU: 0 COMMAND: "fc_rport_eq"
> > > 0 [ffff880218c65760] machine_kexec at ffffffff81031d3b
> > > 1 [ffff880218c657c0] crash_kexec at ffffffff810b8e92
> > > 2 [ffff880218c65890] oops_end at ffffffff814ef890
> > > 3 [ffff880218c658c0] no_context at ffffffff8104226b
> > > 4 [ffff880218c65910] __bad_area_nosemaphore at ffffffff810424f5
> > > 5 [ffff880218c65960] bad_area_nosemaphore at ffffffff810425c3
> > > 6 [ffff880218c65970] __do_page_fault at ffffffff81042c9d
> > > 7 [ffff880218c65a90] do_page_fault at ffffffff814f186e
> > > 8 [ffff880218c65ac0] page_fault at ffffffff814eec25
> > > 9 [ffff880218c65bb8] fc_fcp_complete_locked at ffffffffa02ed739 [libfc]
> > > 10 [ffff880218c65c08] fc_fcp_retry_cmd at ffffffffa02ed86f [libfc]
> > > 11 [ffff880218c65c28] fc_fcp_recv at ffffffffa02eed3f [libfc]
> > > 12 [ffff880218c65d28] fc_exch_mgr_reset at ffffffffa02e2373 [libfc]
> > > 13 [ffff880218c65db8] fc_rport_work at ffffffffa02e9f10 [libfc]
> > > 14 [ffff880218c65e38] worker_thread at ffffffff8108b250
> > > 15 [ffff880218c65ee8] kthread at ffffffff81090806
> > > 16 [ffff880218c65f48] kernel_thread at ffffffff8100c10a
> > >
> > > It results from two contexts that try to manipulate the same
> > fcoe_exch_pool
> > > without syncronizing themselves:
> > >
> > > 1) The fcoe event_work workqueue which calls
> > > fc_rport_work
> > > fc_exch_mgr_reset
> > > fc_exch_pool_reset
> > >
> > > 2) The FCOE transport destroy path, which schedules a destroy_work
> > workqueue,
> > > calling:
> > > fcoe_destroy_work
> > > fcoe_if_destroy
> > > fc_exch_mgr_free
> > > fc_exch_mgr_del
> > > fc_exch_mgr_destroy
> > >
> > > The pool_reset path holds the pool look, but no references to the pool
> > manager
> > > kobject, while exch_mgr_destroy path drops what is ostensibly the last
> > > reference to the pool manager kobject (causing its freeing), while not
> > holding
> > > the pool lock.
> > >
> > > The attached patch has been confirmed to prevent the panic.
> > >
> > > Signed-off-by: Neil Horman<nhor...@tuxdriver.com>
> > > CC: Robert Love<robert.w.l...@intel.com>
> > > ---
> > > drivers/scsi/libfc/fc_exch.c | 2 ++
> > > 1 files changed, 2 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/drivers/scsi/libfc/fc_exch.c
> > b/drivers/scsi/libfc/fc_exch.c
> > > index 8325489..d7c8100 100644
> > > --- a/drivers/scsi/libfc/fc_exch.c
> > > +++ b/drivers/scsi/libfc/fc_exch.c
> > > @@ -1815,10 +1815,12 @@ void fc_exch_mgr_reset(struct fc_lport *lport,
> > u32 sid, u32 did)
> > > unsigned int cpu;
> > >
> > > list_for_each_entry(ema,&lport->ema_list, ema_list) {
> > > + kref_get(&ema->mp->kref)
> >
> > If the above contexts are not synchronized, we may be accessing illegal
> > pointer in ema. Just before calling kref_get(), the ema may have been
> > free'd in fc_exch_mgr_del(). We do not still have protection for it,
> > right? am I missing something?
> >
> Are we not holding lport lock when doing fc_exch_mgr_free() to guard the
> ema_list?
> I thought we are, I might be wrong.
>
I thought that lock got grabbed somwhere in fcoe_destroy_work
Neil
> yi
>
> >
> > > for_each_possible_cpu(cpu)
> > > fc_exch_pool_reset(lport,
> > > per_cpu_ptr(ema->mp->pool,
> > > cpu),
> > > sid, did);
> > > + kref_put(&ema->mp->kref, fc_exch_mgr_destroy);
> > > }
> > > }
> > > EXPORT_SYMBOL(fc_exch_mgr_reset);
> >
> >
> > _______________________________________________
> > devel mailing list
> > devel@open-fcoe.org
> > https://lists.open-fcoe.org/mailman/listinfo/devel
> _______________________________________________
> devel mailing list
> devel@open-fcoe.org
> https://lists.open-fcoe.org/mailman/listinfo/devel
>
_______________________________________________
devel mailing list
devel@open-fcoe.org
https://lists.open-fcoe.org/mailman/listinfo/devel