On Fri, Oct 21, 2011 at 05:47:50PM -0700, Zou, Yi wrote: > > > This oops was reported to me recently: > > > PID: 5176 TASK: ffff880215274100 CPU: 0 COMMAND: "fc_rport_eq" > > > 0 [ffff880218c65760] machine_kexec at ffffffff81031d3b > > > 1 [ffff880218c657c0] crash_kexec at ffffffff810b8e92 > > > 2 [ffff880218c65890] oops_end at ffffffff814ef890 > > > 3 [ffff880218c658c0] no_context at ffffffff8104226b > > > 4 [ffff880218c65910] __bad_area_nosemaphore at ffffffff810424f5 > > > 5 [ffff880218c65960] bad_area_nosemaphore at ffffffff810425c3 > > > 6 [ffff880218c65970] __do_page_fault at ffffffff81042c9d > > > 7 [ffff880218c65a90] do_page_fault at ffffffff814f186e > > > 8 [ffff880218c65ac0] page_fault at ffffffff814eec25 > > > 9 [ffff880218c65bb8] fc_fcp_complete_locked at ffffffffa02ed739 [libfc] > > > 10 [ffff880218c65c08] fc_fcp_retry_cmd at ffffffffa02ed86f [libfc] > > > 11 [ffff880218c65c28] fc_fcp_recv at ffffffffa02eed3f [libfc] > > > 12 [ffff880218c65d28] fc_exch_mgr_reset at ffffffffa02e2373 [libfc] > > > 13 [ffff880218c65db8] fc_rport_work at ffffffffa02e9f10 [libfc] > > > 14 [ffff880218c65e38] worker_thread at ffffffff8108b250 > > > 15 [ffff880218c65ee8] kthread at ffffffff81090806 > > > 16 [ffff880218c65f48] kernel_thread at ffffffff8100c10a > > > > > > It results from two contexts that try to manipulate the same > > > fcoe_exch_pool > > > without syncronizing themselves: > > > > > > 1) The fcoe event_work workqueue which calls > > > fc_rport_work > > > fc_exch_mgr_reset > > > fc_exch_pool_reset > > > > > > 2) The FCOE transport destroy path, which schedules a destroy_work > > > workqueue, > > > calling: > > > fcoe_destroy_work > > > fcoe_if_destroy > > > fc_exch_mgr_free > > > fc_exch_mgr_del > > > fc_exch_mgr_destroy > > > > > > The pool_reset path holds the pool look, but no references to the pool > > > manager > > > kobject, while exch_mgr_destroy path drops what is ostensibly the last > > > reference to the pool manager kobject (causing its freeing), while not > > > holding > > > the pool lock. > > > > > > The attached patch has been confirmed to prevent the panic. > > > > > > Signed-off-by: Neil Horman <nhor...@tuxdriver.com> > > > CC: Robert Love <robert.w.l...@intel.com> > > > > Thanks, Neil. > > > > yi > Neil, I have fixed the issues below while applying, the following > will be updated to your original patch description when I pull > this in later to open-fcoe. > > 1. added mixxing ';' at kref_get() to fix compiling error > 2. added the declaration of fc_exch_mgr_destroy() to fix compiling error > 3. fixed one typo of 'look' to 'lock' in patch description > 4. added a prefix of libfc in patch title > > -yi > Thank you, I apologize, I had those fixed locally, but neglected to ammend my commit prior to running git-send-email. Neil
> _______________________________________________ devel mailing list devel@open-fcoe.org https://lists.open-fcoe.org/mailman/listinfo/devel