On Thu, 2010-09-02 at 07:14 +0000, Bhanu Gollapudi wrote:
> On Wed, 2010-09-01 at 15:49 -0700, Robert Love wrote:
> > On Tue, 2010-08-31 at 23:14 -0700, Bhanu Gollapudi wrote:
> > > On Tue, 2010-08-31 at 18:26 -0700, Robert Love wrote: 
> > > > On Wed, 2010-08-25 at 01:31 +0000, Bhanu Gollapudi wrote:
> > > > > On systems with with higher nr_cpu_ids, per cpu exchange pool will
> > > > > have very limited exchange resources. This can cause certain
> > > > > operations such as discovery to fail even with finite amount
> > > > > of retires. This problem is handled by dividing the entire
> > > > 
> > > > Hi Bhanu,
> > > > 
> > > >    Can you tell me a bit more about your scenario and how many CPUs
> > > > you're dealing with? Is it an offload EM or non-offload EM that's
> > > > running out of resources?
> > > 
> > > Hi Robert, 
> > > 
> > > I described the scenario in one of the earlier emails
> > > http://www.mail-archive.com/[email protected]/msg07738.html
> > > 
> > Ah. Thanks for the reference.
> > 
> > I think this is a locking issue and not a resource problem. The lack of
> > resources definitely triggers the problem, but I think the system hangs
> > due to a deadlock. I don't think that discovery would should ever
> > completely fail. The discovery engine should retry 3 times and then the
> > lport state machine should restart causing discovery to restart.
> > Eventually discovery should succeed once the exchanges become available.
> > 
> > I think that you pointed out the root cause of the hang in your initial
> > posting. I think that the disc_work context is trying to
> > cancel_delayed_work_sync() on itself.
> > 
> > As the result of a exchange allocation failure fc_disc_error() is called
> > and a disc_work is scheduled with a delay. When then work is executed
> > fc_disc_timeout() is called and we're in the disc_work delayed work
> > context. This again fails to allocate an exchange for GPN_FT which calls
> > fc_disc_error() again. This time the retry_count has been exceeded and
> > disc_done() is called. This in turn calls fc_lport_disc_callback(FAILED)
> > which then calls fc_lport_reset(), which calls fc_lport_reset_locked().
> > fc_lport_reset_locked() calls fc_disc_done() which calls
> > cancel_delayed_work_sync(&disc->disc_work). As you suspected this is
> > blocking as it tries to cancel the disc_work, which is the context that
> > we're in.
> 
> Absolutely, this is the real root cause, and it needs to be fixed.
> However running out of resources can lead into this hang quickly, and
> even if we fix it discovery takes a long time to complete after many
> retries, as we use only a subset of xid resources.  Also, I was not very
> happy to see the xid resources shrink as the CPUs in the system goes
> higher, and hence submitted this change. Is there any downside of having
> the common pool?
> 
No, not that I can think of, at least not with the non-offloaded EM. I
just wanted to point out that making the common pool wouldn't
necessarily _fix_ your problem, only make it less likely.

I think that the OEM would not want the common pool though. There are
tricks that can be done to bind irqs to CPUs such that I/O stays on the
same CPU. This is done by ensuring that certain XIDs are always mapped
to certain CPUs. By creating a common pool for the OEM that mapping
isn't consistent for the common XIDs.

I want to bring up a slightly unrelated topic as a talking point. I
discovered yesterday that NPIV is broken. This happened when we added
destination MAC address checking to fcoe.ko. The problem is that we're
only checking the N_Port's MAC address and not walking the list of NPIV
VN_Ports to check their MACs. The result is that we fail the validation
of all NPIV VN_Ports, so what you see on the wire is a successful FDISC,
a PLOGI to the name server and an ACC in response. The initiator then
sends an ABTS for the PLOGI to the name server because fcoe.ko prevents
libfc from seeing the ACC response.

Yesterday, we debated a few solutions to this problem and the winning
solution seems to be to make the lport's VN_Port list read-safe so that
we can walk that list to check MACs without grabbing a lock (Chris'
idea). Currently, we only pass the N_Port lport up to libfc, find the
exchange from its EM and then call the upper layer callback (i.e.
fc_rport_plogi_resp()). It isn't until we're in the callback that we
look for the lport.

I realize that this isn't your problem, but what it does do is have
fcoe.ko pass the correct lport up to libfc (fc_exch_recv()). This means
that we don't need to rely on the exchange lookup to find the "real"
lport, which would allow us to have an EM per-lport. This allows for
scaling as NPIV ports are added, but doesn't really help with CPU
scaling.

Thanks, //Rob

_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel

Reply via email to