>>-----Original Message-----
>>From: Imre Deak <imre.d...@intel.com>
>>Sent: Tuesday, August 5, 2025 6:20 PM
>>To: Dmitry Baryshkov <dmitry.barysh...@oss.qualcomm.com>
>>Cc: Huhulea, Nicusor Liviu (FT FDS CES LX PBU 1)
>><nicusor.huhu...@siemens.com>; sta...@vger.kernel.org; dri-
>>de...@lists.freedesktop.org; intel-...@lists.freedesktop.org;
>>cip-...@lists.cip-
>>project.org; shradhagu...@linux.microsoft.com; jouni.hogan...@intel.com;
>>neil.armstr...@linaro.org; jani.nik...@linux.intel.com;
>>maarten.lankho...@linux.intel.com; mrip...@kernel.org; tzimmerm...@suse.de;
>>airl...@gmail.com; dan...@ffwll.ch; joonas.lahti...@linux.intel.com;
>>rodrigo.v...@intel.com; laurentiu.pa...@oss.nxp.com; Hombourger, Cedric (FT
>>FDS CES LX) <cedric.hombour...@siemens.com>; Bobade, Shrikant Krishnarao
>>(FT FDS CES LX PBU 1) <shrikant.bob...@siemens.com>
>>Subject: Re: [PATCH] drm/probe-helper: fix output polling not resuming after
>>HPD
>>IRQ storm
>>
>>On Tue, Aug 05, 2025 at 01:46:51PM +0300, Dmitry Baryshkov wrote:
>>> On Mon, Aug 04, 2025 at 11:13:59PM +0300, Nicusor Huhulea wrote:
>>> > A regression in output polling was introduced by commit
>>> > 4ad8d57d902fbc7c82507cfc1b031f3a07c3de6e
>>> > ("drm: Check output polling initialized before disabling") in the 6.1.y
>>> > stable
>>tree.
>>> > As a result, when the i915 driver detects an HPD IRQ storm and
>>> > attempts to switch from IRQ-based hotplug detection to polling, output
>>> > polling
>>fails to resume.
>>> >
>>> > The root cause is the use of dev->mode_config.poll_running. Once
>>> > poll_running is set (during the first connector detection) the calls
>>> > to drm_kms_helper_poll_enable(), such as
>>> > intel_hpd_irq_storm_switch_to_polling() fails to schedule
>>> > output_poll_work as
>>expected.
>>> > Therefore, after an IRQ storm disables HPD IRQs, polling does not start,
>>breaking hotplug detection.
>>>
>>> Why doesn't disable path use drm_kms_helper_poll_disable() ?
>>
>>In general i915 doesn't disable polling as a whole after an IRQ storm and a 2
>>minute delay (or runtime resuming), since on some other connectors the polling
>>may be still required.
>>
>>Also, in the 6.1.y stable tree drm_kms_helper_poll_disable() is:
>>
>> if (drm_WARN_ON(dev, !dev->mode_config.poll_enabled))
>> return;
>>
>> cancel_delayed_work_sync(&dev->mode_config.output_poll_work);
>>
>>so calling that wouldn't really fix the problem, which is clearly just an
>>incorrect
>>backport of the upstream commit 5abffb66d12bcac8 ("drm:
>>Check output polling initialized before disabling") to the v6.1.y stable tree
>>by
>>commit 4ad8d57d902f ("drm: Check output polling initialized before
>>disabling") in
>>v6.1.y.
>>
>>The upstream commit did not add the check for
>>dev->mode_config.poll_running in drm_kms_helper_poll_enable(), the
>>condition was only part of the diff context in the commit. Hence adding the
>>condition in the 4ad8d57d902f backport commit was incorrect.
>>
>>> > The fix is to remove the dev->mode_config.poll_running in the check
>>> > condition, ensuring polling is always scheduled as requested.
I'm not a full-time kernel developer, but I spent some time trying to really
understand both the rationale and the effects of commit "Check output polling
initialized before disabling."
Here's how I see the issue:
That commit was introduced mainly as a defensive measure, to protect drivers
such as hyperv-drm that do not initialize connector polling. In those drivers,
calling drm_kms_helper_poll_disable() without proper polling setup could
trigger warnings or even stack traces, since there are no outputs to poll and
the polling helpers don't apply.
>From what I understand, the poll_running variable was added to prevent cases
>where polling was accidentally disabled for drivers that were never set up for
>it. However, this fix ended up creating a new class of breakage, which I have
>observed and describe here.
To me, the core logic should be simple: if polling is needed, then we should
allow it to proceed (regardless of whether it's been scheduled previously).
Looking at the drm_kms_helper_poll_disable()
if (drm_WARN_ON(dev, !dev->mode_config.poll_enabled))
return;
If the driver never enabled polling (that is, drm_kms_helper_poll_enable() was
never called), then the disable operation is effectively a no-op-totally safe
for drivers like hyperv-drm.
And in the enable function drm_kms_helper_poll_enable(..):
if (drm_WARN_ON_ONCE(dev, !dev->mode_config.poll_enabled) ||
- !drm_kms_helper_poll || dev->mode_config.poll_running)
+ !drm_kms_helper_poll)
return;
The main thing being guarded here is whether polling has actually been
initialized:
1.For polling drivers like i915, removing poll_running from the enable path is
both safe and necessary: it fixes the regression with HPD polling after IRQ
storms
2.For non-polling drivers like hyperv-drm, the existing checks on poll_enabled
in both enable and disable functions are sufficient. Removing poll_running
doesn't affect these drivers-they don't go through the polling code paths, so
no polling gets scheduled or canceled by mistake
Therefore based on my understanding and testing removing poll_running guard not
only fixes a real bug but also does not introduce new regressions for drivers
that don't use output polling.
Nicu
>>> >
>>> > Notes:
>>> > Initial analysis, assumptions, device testing details, the correct
>>> > fix and detailed rationale were discussed here
>>> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flo
>>> > re.kernel.org%2Fstable%2FaI32HUzrT95nS_H9%40ideak-
>>desk%2F&data=05%7C
>>> >
>>02%7Cnicusor.huhulea%40siemens.com%7Ca68c32efd9a044f7f86308ddd4338f8
>>> >
>>8%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C638900040113270385
>>%7C
>>> >
>>Unknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwM
>>CIsIl
>>> >
>>AiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=
>>mH
>>> > ublqHV5x2K%2BLZ3ylYHsQhtLbcQSwSwZxiqBl2KohM%3D&reserved=0
>>> >
>>>
>>> --
>>> With best wishes
>>> Dmitry