Roland Dreier wrote:
> + if (timer_pending(&target->qp_err_timer))
> + del_timer_sync(&target->qp_err_timer);
> +
> spin_unlock_irq(target->scsi_host->host_lock);
As was pointed out, I don't think you can do del_timer_sync while
holding the lock, since the timer function takes the same lock.
I don't think that it takes the same lock (ie. host_lock).; however, the
problem is kernel configured CONFIG_LOCKDEP, del_timer_sync() call
local_irq_save/restore() to acquire/release timer->lockdep_map
But I don't know that just switching to del_timer without the sync works
here ... without the sync then the timeout function could still run any
time after the del_timer, even after everything gets freed.
I will move del_timer_sync() of the host_lock. It's safe because we only
setup the same timer when the target state is ALIVE, however, target
state is already REMOVED when del_timer_sync() is called
BTW the test of timer_pending isn't needed here... del_timer does the
test internally anyway.
I do agree it would be very good to improve the SRP error handling. I
have some concerns about the overall design here -- it seems that if we
handle connection failure and allow a new connection to proceed while
cleaning up asynchronously, then this opens the door to a lot of
complexity, and I don't see that complexity handled in this patchset.
For example, the new connection could fail too before the old one is
done cleaning up, etc, etc and we end up with an arbitrarily large queue
of things waiting to clean up. Or maybe it really it is simpler than that.
It's is complicated; however, it is not as complicated as you said.
There are two flows/places where we clean-up connection and create new
one (ie. call srp_reconnect_target) in async event timer and scsi error
handling srp_reset_host(); however, in srp_reset_host() the patch 3/4
check for true condition of timer_pending or qp_in_error or target
state wrapped in host_lock then it won't call srp_reconnect_target(). It
left one flow/place to clean up / create new connection.
We destroy cm connection/cq/qp and create new connection/cq/qp in the
same flow/place; we also already clean up all scsi/srp resources
associated with old connection before create new one; therefore, the
only resources pended for clean up is cm/connection resources, qp/cq
low-level resources. And I believe the lower IB stack does that well.
I think the best way to move this forward would be to post another
cleaned up version of your patch set. Please try to reorganize things
so each patch is reasonably self contained. Of course your patchset is
taking multiple steps to improve things. But as much as possible,
please try to avoid combining two things into a single patch, and
conversely also try to avoid putting things into a patch that don't make
sense without a later patch.
Avoiding policy in the kernel as much as possible in terms of hard-coded
timeouts etc is a good goal too.
Also it would help to give each patch a separate descriptive subject,
and put as much detail in the changelogs as you can.
Thanks,
Roland
I'll follow the instructions above, rework the patch set and resubmit
thanks,
-vu
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html