Hello Marek, On Wed, 3 Sep 2025 23:44:59 +0200 Marek Vasut <marek.va...@mailbox.org> wrote:
> On 3/25/25 3:52 PM, Boris Brezillon wrote: > > Hello Boris, > > sorry for the late reply. > > >>>>>>> Hm, that might be the cause of the fast reset issue (which is a fast > >>>>>>> resume more than a fast reset BTW): if you re-assert the reset line on > >>>>>>> runtime suspend, I guess this causes a full GPU reset, and the MCU > >>>>>>> ends > >>>>>>> up in a state where it needs a slow reset (all data sections reset to > >>>>>>> their initial state). Can you try to move the reset_control_[de]assert > >>>>>>> to the unplug/init functions? > >>>>>> Is it correct to assume , that if I remove all reset_control_assert() > >>>>>> calls (and keep only the _deassert() calls), the slow resume problem > >>>>>> should go away too ? > >>>>> > >>>>> Yeah, dropping the _assert()s should do the trick. > >>>> Hmmm, no, that does not help. I was hoping maybe NXP can chime in and > >>>> suggest something too ? > >>> > >>> Can you try keep all the clks/regulators/power-domains/... on after > >>> init, and see if the fast resume works with that. If it does, > >>> re-introduce one resource at a time to find out which one causes the > >>> MCU to lose its state. > >> > >> I already tried that too . I spent quite a while until I reached that L2 > >> workaround in fact. > > > > So, with your RPM suspend/resume being NOPs, it still doesn't work? > > Unless the FW is doing something behind our back, I don't really see > > why this would fail on your platform, but not on the rk3588. Are you > > sure the power domains are kept on at all times. I'm asking, because if > > you linked all the PDs, the on/off sequence is automatically handled by > > the RPM core at suspend/resume time. > > I revisited this now. > > Can you please test the following patch (also attached) on one of your > devices, and tell me what the status is at the end . The diff sets the > GLB_HALT bit and then clears it again, which I suspect should first halt > the GPU and (this is what I am unsure about) then again un-halt/resume > the GPU ? It doesn't work like that. What you're describing is like executing "shutdown" on your terminal and then typing "boot" on the keyboard after your computer has been shut down. > > " > diff --git a/drivers/gpu/drm/panthor/panthor_fw.c > b/drivers/gpu/drm/panthor/panthor_fw.c > index 9bf06e55eaeea..57c0d4fd29aa2 100644 > --- a/drivers/gpu/drm/panthor/panthor_fw.c > +++ b/drivers/gpu/drm/panthor/panthor_fw.c > @@ -1087,8 +1087,16 @@ void panthor_fw_pre_reset(struct panthor_device > *ptdev, bool on_hang) > struct panthor_fw_global_iface *glb_iface = > panthor_fw_get_glb_iface(ptdev); > u32 status; > > +pr_err("%s[%d] pre-halt status=%x\n", __func__, __LINE__, > gpu_read(ptdev, MCU_STATUS)); > + > panthor_fw_update_reqs(glb_iface, req, GLB_HALT, GLB_HALT); > gpu_write(ptdev, CSF_DOORBELL(CSF_GLB_DOORBELL_ID), 1); > +mdelay(100); > +pr_err("%s[%d] likely-halted status=%x\n", __func__, __LINE__, > gpu_read(ptdev, MCU_STATUS)); > + panthor_fw_update_reqs(glb_iface, req, 0, GLB_HALT); > +mdelay(100); > +pr_err("%s[%d] likely-running ? status=%x\n", __func__, __LINE__, > gpu_read(ptdev, MCU_STATUS)); > + > if (!gpu_read_poll_timeout(ptdev, MCU_STATUS, status, > status == MCU_STATUS_HALT, 10, > 100000)) { > " > > In my case, the relevant output looks like this: > > " > [ 3.326805] panthor_fw_pre_reset[1090] pre-halt status=1 > [ 3.432151] panthor_fw_pre_reset[1095] likely-halted status=2 > [ 3.542179] panthor_fw_pre_reset[1098] likely-running ? status=2 > " > > That means, the GPU remains halted at the end, even if the "GLB_HALT" > bit is cleared before the last print. The clearing of GLB_HALT is also > what panthor_fw_post_reset() does. After the halt has been processed by the FW, the memory region where you check the halt status again is inert, since the micro-controller (MCU) supposed to update those bits is off at this point. The FW interface is really just a shared memory region between the CPU and MCU, nothing more. > > I suspect the extra soft reset I did before "un-halted" the GPU and > allowed it to proceed. Hm, not quite. I mean, you still need to explicitly boot the MCU after a reset, which is what the write to MCU_CONTROL [1] does. What the soft-reset does though, is reset all GPU blocks, including the MCU. This means the MCU starts from a fresh state when you reach [1]. If I had to guess, I'd say something is messed up when the GPU is halted, and you need a soft-reset to recover from that. Unfortunately, I don't know enough about what your FW is doing to help. Maybe Arm/Freescale could... > > I wonder if there is some way to un-halt the GPU using some gpu_write() > direct register access, is there ? That's MCU_CONTROL, yes. And it's done here [1] already. > Maybe the GPU remains halted because > setting the GLB_HALT stops command stream processing, and the GPU never > samples the clearing of GLB_HALT and therefore remains halted forever ? Exactly that, and that's expected. Regards, Boris [1]https://elixir.bootlin.com/linux/v6.16.4/source/drivers/gpu/drm/panthor/panthor_fw.c#L1034