On Sat, Nov 14, 2015 at 1:44 PM, Karol Herbst <nouv...@karolherbst.de> wrote: > I encountered while stresstesting the reclocking code, that rarely (1 out of > 20.000+ requests) we don't get any IRQ in nvkm_pmu_intr. > > This means we have a queued message on the pmu, but nouveau doesn't read it > and > waits infinitely in nvkm_pmu_send: > if (reply) { > wait_event(pmu->recv.wait, (pmu->recv.process == 0)); > > therefore let us use wait_event_timeout with a 1s timeout frame and just check > whether there is a message queued and handle it if there is one. > > Return -ETIMEDOUT whenever we timed out and there is no message queued or when > we hit another timeout while trying to read the message without getting any > IRQ > > The benefit of not using wait_event is, that we don't have a kworker waiting > on an event, which makes it easier to reload the module at runtime, which > helps > me developing on nouveau on my laptop a lot, because I don't need to reboot > anymore > > Nethertheless, we shouldn't use wait_event here, because we can't guarantee > any > answere at all, can we? > > v2: moved it into a new function > > Signed-off-by: Karol Herbst <nouv...@karolherbst.de> > --- > drm/nouveau/nvkm/subdev/pmu/base.c | 43 > ++++++++++++++++++++++++++++++++------ > 1 file changed, 37 insertions(+), 6 deletions(-) > > diff --git a/drm/nouveau/nvkm/subdev/pmu/base.c > b/drm/nouveau/nvkm/subdev/pmu/base.c > index 6b2007f..fafbe2a 100644 > --- a/drm/nouveau/nvkm/subdev/pmu/base.c > +++ b/drm/nouveau/nvkm/subdev/pmu/base.c > @@ -43,6 +43,41 @@ nvkm_pmu_handle_reclk_request(struct work_struct *work) > nvkm_clk_pmu_reclk_request(clk, pmu->intr.data[0]); > } > > +static int > +wait_for_pmu_reply(struct nvkm_pmu *pmu, u32 reply[2]) > +{ > + struct nvkm_subdev *subdev = &pmu->subdev; > + struct nvkm_device *device = subdev->device; > + unsigned long jiffies = msecs_to_jiffies(1000); > + > + if (!wait_event_timeout(pmu->recv.wait, (pmu->recv.process == 0), > jiffies)) { > + u32 addr = nvkm_rd32(device, 0x10a4cc); > + nvkm_error(subdev, "wait on reply timed out\n"); > + > + if (addr != nvkm_rd32(device, 0x10a4c8)) { > + nvkm_error(subdev, "found queued message without > getting an interrupt\n"); > + schedule_work(&pmu->recv.work); > + > + if (!wait_event_timeout(pmu->recv.wait, > (pmu->recv.process == 0), jiffies)) { > + nvkm_error(subdev, "failed to repair PMU > state\n"); > + goto reply_error; > + } > + } else
Not sure whether kernel style dictates this, but I really hate these "hanging" else's... both sides should have brackets if either one does. > + goto reply_error; > + } > + > + reply[0] = pmu->recv.data[0]; > + reply[1] = pmu->recv.data[1]; > + mutex_unlock(&subdev->mutex); > + return 0; > + > +reply_error: > + reply[0] = 0; > + reply[1] = 0; > + mutex_unlock(&subdev->mutex); > + return -ETIMEDOUT; > +} > + > int > nvkm_pmu_send(struct nvkm_pmu *pmu, u32 reply[2], > u32 process, u32 message, u32 data0, u32 data1) > @@ -88,12 +123,8 @@ nvkm_pmu_send(struct nvkm_pmu *pmu, u32 reply[2], > nvkm_wr32(device, 0x10a580, 0x00000000); > > /* wait for reply, if requested */ > - if (reply) { > - wait_event(pmu->recv.wait, (pmu->recv.process == 0)); > - reply[0] = pmu->recv.data[0]; > - reply[1] = pmu->recv.data[1]; > - mutex_unlock(&subdev->mutex); > - } > + if (reply) > + return wait_for_pmu_reply(pmu, reply); Having one function lock and another unlock is a disaster waiting to happen. Perhaps make wiat_for_pmu_reply not handle the unlock and instead do int ret = 0; if (reply) ret = wait_for_pmu_reply() return ret; Additionally leaving the reply[] filling in this function would allow you to avoid annoying error handling and goto's in the other function. > > return 0; > } > -- > 2.6.3 > > _______________________________________________ > Nouveau mailing list > Nouveau@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/nouveau _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau