On 03.01.23 01:57, Peng Fan wrote: >> To: Peng Fan <[email protected]>; [email protected] >> Subject: Re: Question: ivshmem-net >> >> On 21.11.22 07:47, Peng Fan wrote: >>> Hi, >>> >>> I have a question regarding ivshmem-net, not sure whether we need to >> use local_bh_disable/enable to guard napi_schedule. >>> >>> See below patch: >>> >> https://patc >>> hes.linaro.org%2Fproject%2Flinux- >> usb%2Fpatch%2F877dk162mo.ffs%40nanos. >>> >> tec.linutronix.de%2F&data=05%7C01%7Cpeng.fan%40nxp.com%7C10674af >> e061c4 >>> >> fbdd5cd08daec94e54a%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0 >> %7C63808 >>> >> 2421486787248%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL >> CJQIjoiV2lu >>> >> MzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nxYB >> 9%2BjoTRb >>> hTF7GGaa32nmr7hU7zCk2fEOL5ZbbL34%3D&reserved=0 >>> " >>> The driver invokes napi_schedule() in several places from task >>> context. napi_schedule() raises the NET_RX softirq bit and relies on >>> the calling context to ensure that the softirq is handled. That's >>> usually on return from interrupt or on the outermost local_bh_enable(). >>> >>> But that's not the case here which causes the soft interrupt handling >>> to be delayed to the next interrupt or local_bh_enable(). If the task >>> in which context this is invoked is the last runnable task on a CPU >>> and the CPU goes idle before an interrupt arrives or a >>> local_bh_disable/enable() pair handles the pending soft interrupt then >>> the NOHZ idle code emits the following warning. >>> >>> NOHZ tick-stop error: Non-RCU local softirq work is pending, handler >> #08!!! >>> >>> Prevent this by wrapping the napi_schedule() invocation from task >>> context into a local_bh_disable/enable() pair. >>> " >>> >> https://lore >>> .kernel.org%2Fall%2F87y28b9nyn.ffs%40tglx%2Ft%2F&data=05%7C01%7C >> peng.f >>> >> an%40nxp.com%7C10674afe061c4fbdd5cd08daec94e54a%7C686ea1d3bc2b >> 4c6fa92c >>> >> d99c5c301635%7C0%7C0%7C638082421486787248%7CUnknown%7CTWFp >> bGZsb3d8eyJW >>> >> IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7 >> C3000% >>> >> 7C%7C%7C&sdata=g%2FR48KwZCOVGt6momplzAv5VwmWpykE6c0e29pCpv >> 9k%3D&reserv >>> ed=0 >>> >>> I draft one: >>> diff --git a/drivers/net/ivshmem-net.c b/drivers/net/ivshmem-net.c >>> index 3bcd39b91176e..81e19d80bd0a7 100644 >>> --- a/drivers/net/ivshmem-net.c >>> +++ b/drivers/net/ivshmem-net.c >>> @@ -558,7 +558,9 @@ static void ivshm_net_run(struct net_device *ndev) >>> >>> netif_start_queue(ndev); >>> napi_enable(&in->napi); >>> + local_bh_disable(); >>> napi_schedule(&in->napi); >>> + local_bh_enable(); >>> ivshm_net_set_state(in, IVSHM_NET_STATE_RUN); } >>> >>> There are other places calling napi_schedule, but seems no need >> local_bh_disable/enable to protect. >>> >>> Not sure the upper change is valid or not, please help check. >>> >> >> Looks suspicious, indeed. I wonder why there are not runtime checks in >> napi_schedule to detect its needs and report a wrong bh state. >> >> Does this issue only trigger with NOHZ enabled? > [Peng Fan] > > I not tested other configurations, but I think so from various > lkml thread reading. >
Ok, makes sense. And Peter even wrote a detection patch as it seems. Would you write a proper patch from your change, maybe also explaining why the other spots do not need this? Then I could merge it into queues/jailhouse where we track the driver history (it's usually folded in stable jailhouse-enabling branches). Thanks, Jan -- Siemens AG, Technology Competence Center Embedded Linux -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jailhouse-dev/5786bf3d-f0f2-48d3-609e-a9fdc68de40f%40siemens.com.
