On Mon, 18 May 2026 16:33:20 -0700
Chia-I Wu <[email protected]> wrote:
> > >
> > >
> > > >
> > > > >
> > > > > > if (!ptdev->scheduler)
> > > > > > return;
> > > > > >
> > > > > > - atomic_or(events, &ptdev->scheduler->fw_events);
> > > > > > - sched_queue_work(ptdev->scheduler, fw_events);
> > > > > > + guard(spinlock_irqsave)(&ptdev->scheduler->events_lock);
> > > > > > +
> > > > > > + if (events & JOB_INT_GLOBAL_IF) {
> > > > > > + sched_process_global_irq_locked(ptdev);
> > > > > > + events &= ~JOB_INT_GLOBAL_IF;
> > > > > > + }
> > > > > > +
> > > > > > + while (events) {
> > > > > > + u32 csg_id = ffs(events) - 1;
> > > > > > +
> > > > > > + sched_process_csg_irq_locked(ptdev, csg_id);
> > > > > > + events &= ~BIT(csg_id);
> > > > > > + }
> > > > > This handles all fw events in the irq context. Are there concerns that
> > > > > it may take too long? I might be wrong, but it seems possible to
> > > > > handle only CSG_SYNC_UPDATE and defer the rest as before.
> > > >
> > > > I started with just the SYNC_UPDATE processing done in the hard-irq
> > > > context, but after auditing the other stuff done in the handler, I
> > > > realized it's basically just deferring all actual processing to work
> > > > items. Yes, there's the overhead of demuxing the events from the
> > > > ack/req regs, but part of this is already done to get to SYNC_UPDATE
> > > > anyway, so at this point we're probably better off demuxing everything
> > > > and scheduling works for all kind of events.
> > > >
> > > > I also compared the perfs between the two approaches (though I didn't
> > > > do as much testing as I did with the new version, so I might have
> > > > missed something), and it didn't seem to matter at all, because the
> > > > interrupts we receive the most are SYNC_UPDATE and IDLE events, and
> > > > those are at the same level.
> > > Looking at ftrace irq events, when there is one active csg,
> > > panthor-job takes 6us (median) / 17us (95%) / 27us (slowest).
> > >
> > > I don't have a good sense if that's considered normal in hardirq. But
> > > if that is ever an issue, and if the majority of the time is spent in
> > > CSG_SYNC_UPDATE anyway, we can always revert the last patch to move
> > > processing to threaded handler.
> >
> > Actually, the threaded -> hard transition (patch 9) is where the perf
> > gain is.
> hardirq is even more timely for sure. For our use case, the threaded
> handler is RT and is also good enough.
Yeah, true. I forgot you were forcing RT priority on threaded handlers.
Anyway, let's stick to hardirqs for now, and revisit it if it proves to
be too much work done in irq context.