Sorry, I forgot to include the chunk of code from the gmirror worker thread which made me suspect this could be the problem:

[..]

               /* Get first request from the queue. */
               mtx_lock(&sc->sc_queue_mtx);
               bp = bioq_first(&sc->sc_queue);
               if (bp == NULL) {
                       if ((sc->sc_flags &
                           G_MIRROR_DEVICE_FLAG_DESTROY) != 0) {
                               mtx_unlock(&sc->sc_queue_mtx);
                               if (g_mirror_try_destroy(sc)) {
                                       curthread->td_pflags &= ~TDP_GEOM;
G_MIRROR_DEBUG(1, "Thread exiting.");
                                       kthread_exit(0);
                               }
                               mtx_lock(&sc->sc_queue_mtx);
                       }
                       sx_xunlock(&sc->sc_lock);
                       /*
                        * XXX: We can miss an event here, because an event
* can be added without sx-device-lock and without * mtx-queue-lock. Maybe I should just stop using * dedicated mutex for events synchronization and
                        *      stick with the queue lock?
* The event will hang here until next I/O request
                        *      or next event is received.
                        */
MSLEEP(sc, &sc->sc_queue_mtx, PRIBIO | PDROP, "m:w1",
                           timeout * hz);
                       sx_xlock(&sc->sc_lock);
                       G_MIRROR_DEBUG(5, "%s: I'm here 4.", __func__);
                       continue;
               }
               bioq_remove(&sc->sc_queue, bp);
               mtx_unlock(&sc->sc_queue_mtx);

Christian S.J. Peron wrote:

It almost looks as if a user frequently runs gmirror(8) to query the status of their array. Under a high load situation, the worker is busy, so at one un-lucky momment, gmirror(8) is run:

   (1) gmirror(8) waits for sc->sc_lock owned by the worker
   (2) The worker then drops the lock
   (3) gmirror(8) proceeds
   (4) Worker wakes up and waits for sc->sc_lock
(5) Only gmirror never will because it's waiting on a resource (presumably owned by the worker thread)?

I am not certain this is correct, so I have included pjd in the CC loop, hoping he can help shed some light on the subject :)




_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to