Roland Dreier wrote:
> +static int srp_dev_loss_tmo = 60;
I don't think the name needs to be this abbreviated. We don't
necessarily need the srp_ prefix, but probably "device_loss_timeout" is
much clearer without being too much longer.
OK
> +
> +module_param(srp_dev_loss_tmo, int, 0444);
> +MODULE_PARM_DESC(srp_dev_loss_tmo,
> + "Default number of seconds that srp transport should \
> + insulate the lost of a remote port (default is 60 secs");
I can't understand this description. What does "insulate the lost" of a
port mean?
I should change "remote port" to just "port". It means that multipath
driver won't know about port offline event (pulling cable, power
cycling switch, target...) and won't act/fail-over because srp won't
return error code until this timeout expired
> +static void srp_reconnect_work(struct work_struct *work)
> +{
> + struct srp_target_port *target =
> + container_of(work, struct srp_target_port, work);
> +
> + srp_reconnect_target(target);
> + target->work_in_progress = 0;
surely this is racy... isn't it possible for a context to see
work_in_progress as 1, decide not to schedule the work, and then have it
set to 0 immediately afterwards by the workqueue context?
Yes, it is racy. It should be in lock_irq scsi host_lock
> + target->qp_err_timer.expires = time * HZ + jiffies;
given that this is only with 1 second resolution, probably makes sense
to either make it a deferrable timer or round the timeout to avoid extra
wakeups.
OK - I'll round the timeout.
> + add_timer(&target->qp_err_timer);
I don't see anywhere that this is canceled on module unload etc?
My mistake. Bart also pointed it out. I'll fix this.
> + srp_qp_err_add_timer(target,
> + srp_dev_loss_tmo - 55);
> + if (srp_dev_loss_tmo < 60)
> + srp_dev_loss_tmo = 60;
I don't understand the 55 and the 60 here... what are these magic
numbers? Wouldn't it make sense for the user to specify the actual
timeout that is used (value - 55) rather than the value and then
secretly subtracting 55?
- R.
First it does not make sense for user to set it below 60; therefore, it
is forced to have 60 and above
With async event handler, srp can detect local port offline and set
timer exact device_loss_timeout; however, it does not have mechanism to
detect remote port offline (srp_daemon need to register trap and
communicate remote port in/out fabric down to srp driver)
I should just add timer (X seconds) instead of (device_loss_tmo - 55) in
case receiving cqe error and/or connection close event
-vu
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html