In case of i/f destroy all exches must be freed before its EM
mempool destroyed but currently some exches could be still
releasing in their scheduled work while EM mempool destroy called.

Fixing this issue by adding work queue thread fc_exch_workqueue
for exch timeout work scheduling and then flush this work queue
before EM mempool gets destroyed.

The cancel_delayed_work_sync cannot be called during final
fc_exch_reset due to lport locking orders, so removes related
comment block not relevant any more.

More details on this issue is discussed in this email thread
http://www.open-fcoe.org/pipermail/devel/2009-August/003439.html

RFC notes:-

   This issue is with rare race to reproduce, so I added test code
to simulate this race. I added test code to start dummy large exch
timer on a exch during its final release in fc_exch_release() and
then made sure this exch is finally released only in timer
execution. I ensured this by adding additional exch tracking state
and setting ep->resp to null. I also skipped cancel_delayed_work
in fc_exch_reset code path for this dummy exch, so that this will
be released during added flush_workqueue() forcing fc_exch_timeout()
handler call before its large timeout value.

   Now at destroy I do see dummy exch still pending before added
flush_workqueue() but it remains pending even after finished
calling flush_workqueue(). Though I expected flush_workqueue() to
force exch timer work execution and then block until all pending
work flushed. I see later dummy exch timer fires on its timeout
value and then only dummy exch get freed, I also had test code
waiting for all exch to be freed for large dummy exch timeout
value.

  So it seems either flush_workqueue() is not working as expected
in this case or I messed up in testing while simulating this race.

  Any thoughts, why flush_workqueue() was not blocked in this
case?

  I don't see the issue earlier fix had with calling
flush_scheduled_work() from another work context fcoe_destroy_work.
So perhaps Joe you can test this fix since you were seeing this
race to confirm my testing was messed up and fix works.

Signed-off-by: Vasu Dev <[email protected]>
---

 drivers/scsi/libfc/fc_exch.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/libfc/fc_exch.c b/drivers/scsi/libfc/fc_exch.c
index c1c1574..993e008 100644
--- a/drivers/scsi/libfc/fc_exch.c
+++ b/drivers/scsi/libfc/fc_exch.c
@@ -36,6 +36,7 @@ u16   fc_cpu_mask;            /* cpu mask for possible cpus */
 EXPORT_SYMBOL(fc_cpu_mask);
 static u16     fc_cpu_order;   /* 2's power to represent total possible cpus */
 static struct kmem_cache *fc_em_cachep;        /* cache for exchanges */
+struct workqueue_struct *fc_exch_workqueue;
 
 /*
  * Structure and function definitions for managing Fibre Channel Exchanges
@@ -354,8 +355,8 @@ static inline void fc_exch_timer_set_locked(struct fc_exch 
*ep,
 
        FC_EXCH_DBG(ep, "Exchange timer armed\n");
 
-       if (schedule_delayed_work(&ep->timeout_work,
-                                 msecs_to_jiffies(timer_msec)))
+       if (queue_delayed_work(fc_exch_workqueue, &ep->timeout_work,
+                              msecs_to_jiffies(timer_msec)))
                fc_exch_hold(ep);               /* hold for timer */
 }
 
@@ -1446,12 +1447,6 @@ static void fc_exch_reset(struct fc_exch *ep)
 
        spin_lock_bh(&ep->ex_lock);
        ep->state |= FC_EX_RST_CLEANUP;
-       /*
-        * we really want to call del_timer_sync, but cannot due
-        * to the lport calling with the lport lock held (some resp
-        * functions can also grab the lport lock which could cause
-        * a deadlock).
-        */
        if (cancel_delayed_work(&ep->timeout_work))
                atomic_dec(&ep->ex_refcnt);     /* drop hold for timer */
        resp = ep->resp;
@@ -1898,6 +1893,7 @@ void fc_exch_mgr_free(struct fc_lport *lport)
 {
        struct fc_exch_mgr_anchor *ema, *next;
 
+       flush_workqueue(fc_exch_workqueue);
        list_for_each_entry_safe(ema, next, &lport->ema_list, ema_list)
                fc_exch_mgr_del(ema);
 }
@@ -2069,6 +2065,9 @@ int fc_exch_init(struct fc_lport *lp)
        }
        fc_cpu_mask--;
 
+       fc_exch_workqueue = create_singlethread_workqueue("fc_exch_workqueue");
+       if (!fc_exch_workqueue)
+               return -ENOMEM;
        return 0;
 }
 EXPORT_SYMBOL(fc_exch_init);
@@ -2084,5 +2083,6 @@ int fc_setup_exch_mgr(void)
 
 void fc_destroy_exch_mgr(void)
 {
+       destroy_workqueue(fc_exch_workqueue);
        kmem_cache_destroy(fc_em_cachep);
 }

_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel

Reply via email to