On 10/12/2016 05:26 PM, Ewan D. Milne wrote:
On Tue, 2016-10-11 at 10:51 -0400, Ewan D. Milne wrote:
On Sat, 2016-10-08 at 19:35 +0200, Hannes Reinecke wrote:
You might actually be hitting a limitation in the exchange manager code.
The libfc exchange manager tries to be really clever and will assign a
per-cpu exchange manager (probably to increase locality). However, we
only have a limited number of exchanges, so on large systems we might
actually run into a exchange starvation problem, where we have in theory
enough free exchanges, but none for the submitting cpu.

(Personally, the exchange manager code is in urgent need of reworking.
It should be replaced by the sbitmap code from Omar).

Do check how many free exchanges are actually present for the stalling
CPU; it might be that you run into a starvation issue.

We are still looking into this but one thing that looks bad is that
the exchange manager code rounds up the number of CPUs to the next
power of 2 before dividing up the exchange id space (and uses the lsbs
of the xid to extract the CPU when looking up an xid). We have a machine
with 288 CPUs, this code is just begging for a rewrite as it looks to
be wasting most of the limited xid space on ixgbe FCoE.

Looks like we get 512 offloaded xids on this adapter and 4096-512
non-offloaded xids.  This would give 1 + 7 xids per CPU.  However, I'm
not sure that even 4096 / 288 = 14 would be enough to prevent stalling.

And, of course, potentially most of the CPUs aren't submitting I/O, so
the whole idea of per-CPU xid space is questionable.

fc_exch_alloc() used to try all the available exchange managers in the
list for an available exchange id, but this was changed in 2010 so that
if the first matched exchange manager couldn't allocate one, it fails
and we end up returning host busy.  This was due to commit:

commit 3e22760d4db6fd89e0be46c3d132390a251da9c6
Author: Vasu Dev <vasu....@intel.com>
Date:   Fri Mar 12 16:08:39 2010 -0800

    [SCSI] libfc: use offload EM instance again instead jumping to next EM

    Since use of offloads is more efficient than switching
    to non-offload EM. However kept logic same to call em_match
    if it is provided in the list of EMs.

    Converted fc_exch_alloc to inline being now tiny a function
    and already not an exported libfc API any more.

    Signed-off-by: Vasu Dev <vasu....@intel.com>
    Signed-off-by: Robert Love <robert.w.l...@intel.com>
    Signed-off-by: James Bottomley <james.bottom...@suse.de>


Setting the ddp_min module parameter to fcoe to 128MB prevents the ->match
function from permitting the use of the offload exchange manager for the frame,
and we no longer see the problem with host busy status, since it uses the
larger non-offloaded pool.

Yes, this is also the impression I got from reading the spec.
The offload pool is mainly designed for large read or write commands, so using it for _every_ frame is probably not a good idea. And limiting it by the size of the transfers solves the problem quite nicely, as a large size typically is only used by read and writes.
So please send a patch to revert that.


Dr. Hannes Reinecke                   zSeries & Storage
h...@suse.de                          +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to