On Wed, 17 May 2017, 11:01am, Sebastian Andrzej Siewior wrote:
> On 2017-05-12 11:55:52 [-0400], Chad Dupuis wrote:
> > Ok, I believe I've found the issue here. The machine that the test has
> > performed on had many more possible CPUs than active CPUs. We calculate
> > which CPU to the work time on in bnx2fc_process_new_cqes() like this:
> >
> > unsigned int cpu = wqe % num_possible_cpus();
> >
> > Since not all CPUs are active, we were trying to schedule work on
> > non-active CPUs which meant that the upper layers were never notified of
> > the completion. With this change:
> >
> > diff --git a/drivers/scsi/bnx2fc/bnx2fc_hwi.c
> > b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
> > index c2288d6..6f08e43 100644
> > --- a/drivers/scsi/bnx2fc/bnx2fc_hwi.c
> > +++ b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
> > @@ -1042,7 +1042,12 @@ static int bnx2fc_process_new_cqes(struct
> > bnx2fc_rport *tgt)
> > /* Pending work request completion */
> > struct bnx2fc_work *work = NULL;
> > struct bnx2fc_percpu_s *fps = NULL;
> > - unsigned int cpu = wqe % num_possible_cpus();
> > + unsigned int cpu = wqe % num_active_cpus();
> > +
> > + /* Sanity check cpu to make sure it's online */
> > + if (!cpu_active(cpu))
> > + /* Default to CPU 0 */
> > + cpu = 0;
> >
> > work = bnx2fc_alloc_work(tgt, wqe);
> > if (work) {
> >
> > The issue is fixed.
> >
> > Sebastian, can you add this change to your patch set?
>
> Are sure that you can reliably reproduce the issue and fix it with the
> patch above? Because this patch:
Yes it was reproducible each time we would start the FCoE interface. With
the above patch, our sanity test passed with no issues seen.
>
> diff --git a/init/main.c b/init/main.c
> index b0c11cbf5ddf..483a971b1fd2 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -997,6 +997,12 @@ static int __ref kernel_init(void *unused)
> "See Linux Documentation/admin-guide/init.rst for guidance.");
> }
>
> +static void workfn(struct work_struct *work)
> +{
> + pr_err("%s() %d\n", __func__, raw_smp_processor_id());
> +}
> +static DECLARE_WORK(work, workfn);
> +
> static noinline void __init kernel_init_freeable(void)
> {
> /*
> @@ -1040,6 +1046,15 @@ static noinline void __init kernel_init_freeable(void)
>
> (void) sys_dup(0);
> (void) sys_dup(0);
> + {
> +
> + cpu_down(3);
> + pr_err("%s() num possible: %d\n", __func__,
> num_possible_cpus());
> + pr_err("%s() num online : %d\n", __func__,
> num_online_cpus());
> + pr_err("%s() 3 active : %d\n", __func__, cpu_active(3));
> + schedule_work_on(3, &work);
> + ssleep(5);
> + }
> /*
> * check if there is an early userspace init. If yes, let it do all
> * the work
>
> produces this output:
> [ 1.960313] Unregister pv shared memory for cpu 3
> [ 1.997000] kernel_init_freeable() num possible: 8
> [ 1.998073] kernel_init_freeable() num online : 7
> [ 1.999125] kernel_init_freeable() 3 active : 0
> [ 2.000337] workfn() 1
>
> which means, CPU3 is offline and work runs on CPU1 instead. So it does
> already what you suggest except that chances are, that it is not run on
> CPU0 in this case (but on another CPU).
>
> So it either takes some time for wait_for_completion(&io_req->tm_done);
> to come back _or_ there is a leak somewhere where a complete() is
> somehow missing / racing against something.
>
> Sebastian
>