----- Original Message -----
> From: "Hannes Reinecke" <h...@suse.de>
> To: emi...@redhat.com, "vasu dev" <vasu....@intel.com>, "robert w love"
> Cc: "Laurence Oberman" <lober...@redhat.com>, "Linux SCSI Mailinglist"
> email@example.com, "Curtis Taylor (c...@us.ibm.com)"
> <c...@us.ibm.com>, "Bud Brown" <bubr...@redhat.com>
> Sent: Wednesday, October 12, 2016 11:46:16 AM
> Subject: Re: [Open-FCoE] Issue with fc_exch_alloc failing initiated by
> fc_queuecommand on NUMA or large
> configurations with Intel ixgbe running FCOE
> On 10/12/2016 05:26 PM, Ewan D. Milne wrote:
> > On Tue, 2016-10-11 at 10:51 -0400, Ewan D. Milne wrote:
> >> On Sat, 2016-10-08 at 19:35 +0200, Hannes Reinecke wrote:
> >>> You might actually be hitting a limitation in the exchange manager code.
> >>> The libfc exchange manager tries to be really clever and will assign a
> >>> per-cpu exchange manager (probably to increase locality). However, we
> >>> only have a limited number of exchanges, so on large systems we might
> >>> actually run into a exchange starvation problem, where we have in theory
> >>> enough free exchanges, but none for the submitting cpu.
> >>> (Personally, the exchange manager code is in urgent need of reworking.
> >>> It should be replaced by the sbitmap code from Omar).
> >>> Do check how many free exchanges are actually present for the stalling
> >>> CPU; it might be that you run into a starvation issue.
> >> We are still looking into this but one thing that looks bad is that
> >> the exchange manager code rounds up the number of CPUs to the next
> >> power of 2 before dividing up the exchange id space (and uses the lsbs
> >> of the xid to extract the CPU when looking up an xid). We have a machine
> >> with 288 CPUs, this code is just begging for a rewrite as it looks to
> >> be wasting most of the limited xid space on ixgbe FCoE.
> >> Looks like we get 512 offloaded xids on this adapter and 4096-512
> >> non-offloaded xids. This would give 1 + 7 xids per CPU. However, I'm
> >> not sure that even 4096 / 288 = 14 would be enough to prevent stalling.
> >> And, of course, potentially most of the CPUs aren't submitting I/O, so
> >> the whole idea of per-CPU xid space is questionable.
> > fc_exch_alloc() used to try all the available exchange managers in the
> > list for an available exchange id, but this was changed in 2010 so that
> > if the first matched exchange manager couldn't allocate one, it fails
> > and we end up returning host busy. This was due to commit:
> > commit 3e22760d4db6fd89e0be46c3d132390a251da9c6
> > Author: Vasu Dev <vasu....@intel.com>
> > Date: Fri Mar 12 16:08:39 2010 -0800
> > [SCSI] libfc: use offload EM instance again instead jumping to next EM
> > Since use of offloads is more efficient than switching
> > to non-offload EM. However kept logic same to call em_match
> > if it is provided in the list of EMs.
> > Converted fc_exch_alloc to inline being now tiny a function
> > and already not an exported libfc API any more.
> > Signed-off-by: Vasu Dev <vasu....@intel.com>
> > Signed-off-by: Robert Love <robert.w.l...@intel.com>
> > Signed-off-by: James Bottomley <james.bottom...@suse.de>
> > ---
> > Setting the ddp_min module parameter to fcoe to 128MB prevents the ->match
> > function from permitting the use of the offload exchange manager for the
> > frame,
> > and we no longer see the problem with host busy status, since it uses the
> > larger non-offloaded pool.
> Yes, this is also the impression I got from reading the spec.
> The offload pool is mainly designed for large read or write commands, so
> using it for _every_ frame is probably not a good idea.
> And limiting it by the size of the transfers solves the problem quite
> nicely, as a large size typically is only used by read and writes.
> So please send a patch to revert that.
> Dr. Hannes Reinecke zSeries & Storage
> h...@suse.de +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
I will revert the commit and test it here in the lab, and then submit the
Ewan can review.
fcoe-devel mailing list