On 10/9/25 21:22, Ian Rogers wrote:
On Wed, Oct 8, 2025 at 11:50 PM Jinchao Wang <[email protected]> wrote:

On Tue, Oct 07, 2025 at 05:11:52PM -0700, Ian Rogers wrote:
On Tue, Oct 7, 2025 at 3:58 PM Doug Anderson <[email protected]> wrote:

Hi,

On Tue, Oct 7, 2025 at 3:45 PM Ian Rogers <[email protected]> wrote:

On Tue, Oct 7, 2025 at 2:43 PM Doug Anderson <[email protected]> wrote:
...
The buddy watchdog was pretty much following the conventions that were
already in the code: that the hardlockup detector (whether backed by
perf or not) was essentially called the "nmi watchdog". There were a
number of people that were involved in reviews and I don't believe
suggesting creating a whole different mechanism for enabling /
disabling the buddy watchdog was never suggested.

I suspect they lacked the context that 1 in the nmi_watchdog is taken
to mean there's a perf event in use by the kernel with implications on
how group events behave. This behavior has been user
visible/advertised for 9 years. I don't doubt that there were good
intentions by PowerPC's watchdog and in the buddy watchdog patches in
using the file, that use will lead to spurious warnings and behaviors
by perf.

My points remain:
1) using multiple files regresses perf's performance;
2) the file name by its meaning is wrong;
3) old perf tools on new kernels won't behave as expected wrt warnings
and metrics because the meaning of the file has changed.
Using a separate file for each watchdog resolves this. It seems that
there wasn't enough critical mass for getting this right to have
mattered before, but that doesn't mean we shouldn't get it right now.

Presumably your next steps then are to find someone to submit a patch
and try to convince others on the list that this is a good idea. The
issue with perf has been known for a while now and I haven't seen any
patches. As I've said, I won't stand in the way if everyone else
agrees, but given that I'm still not convinced I'm not going to author
any patches for this myself.

Writing >1 of:
```
static struct ctl_table watchdog_hardlockup_sysctl[] = {
{
.procname       = "nmi_watchdog",
.data = &watchdog_hardlockup_user_enabled,
.maxlen = sizeof(int),
.mode = 0444,
.proc_handler   = proc_nmi_watchdog,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
},
};
```
is an exercise of copy-and-paste, if you need me to do the copy and
pasting then it is okay.
Can we get whether a perf event is already in use directly from the
perf subsystem? There may be (or will be) other kernel users of
perf_event besides the NMI watchdog. Exposing that state from the perf
side would avoid coupling unrelated users through nmi_watchdog and
similar features.

For regular processes there is this unmerged proposal:
https://lore.kernel.org/lkml/[email protected]/
it doesn't say whether the counter is pinned - the NMI watchdog's
counter is pinned to be a higher priority that flexible regular
counters that may be multiplexed. I don't believe there is anything to
say whether the kernel has taken a performance counter. In general
something else taking a performance counter is okay as  the kernel
will multiplex the counter or groups of counters.

The particular issue for the NMI watchdog counter is that groups of
events are tested to see if they fit on a PMU, the perf event open
will fail when a group isn't possible and then the events will be
reprogrammed by the perf tool without a group. When the group is
tested the PMU has always assumed that all counters are available,
which of course with the NMI watchdog they are not. This results with
the NMI watchdog causing a group of events to be created that can
never be scheduled.

Addressing the PMU assumption that all counters are available would
resolve the issue. If perf managed reserved or pinned counters
internally, other users would not need to be aware of that detail.

Alternatively, perf could provide an interface to query whether a
counter is pinned. Having the NMI watchdog supply that information
creates coupling between otherwise independent subsystems.


Thanks,
Ian


Thanks,
Ian


-Doug


--
Jinchao



--
Jinchao

Reply via email to