On 6/11/2026 5:10 PM, Stephan Gerhold wrote:
On Thu, Jun 11, 2026 at 11:10:25AM +0800, Aiqun(Maria) Yu wrote:
On 5/22/2026 8:07 PM, Stephan Gerhold wrote:
On Tue, May 19, 2026 at 12:24:23AM -0700, Jingyi Wang wrote:
Subsystems can be brought out of reset by entities such as bootloaders.
As the irq enablement could be later than subsystem bring up, the state
of subsystem should be checked by reading SMP2P bits.

A new qcom_pas_attach() function is introduced. if a crash state is
detected for the subsystem, rproc_report_crash() is called. If the ready
state is detected, it will be marked as "attached", otherwise it could
be the early boot feature is not supported by other entities. In this
case, the state will be marked as RPROC_OFFLINE so that the PAS driver
can load the firmware and start the remoteproc.

Co-developed-by: Gokul Krishna Krishnakumar 
<[email protected]>
Signed-off-by: Gokul Krishna Krishnakumar <[email protected]>
Signed-off-by: Jingyi Wang <[email protected]>

Unfortunately, removing the ping-pong functionality that was present in
previous patch versions makes the whole mechanism a lot more fragile.
I'm not entirely sure if this has changed in SMP2P v2 or more recent
firmware versions, but in my experience the SMP2P "ready" bit does not
tell you if the remoteproc is actually running. The problem is that the
"ready" bit is asserted by the remoteproc when the firmware is ready,
but it is not cleared when you shutdown or forcibly stop the remoteproc.

If this is still the case, you can easily reproduce that with the
following test:

  1. Start the system as usual and let it attach the remoteproc
  2. Manually stop the remoteproc in sysfs (echo stop > state)
  3. modprobe -r qcom_q6v5_pas
  4. modprobe qcom_q6v5_pas
  5. If the "ready" bit is still set, the driver will try attaching the
     remoteproc, but it's actually not running. No recovery will happen.

In this situation, it is very difficult to detect the correct remoteproc
state without relying on an additional query mechanism like the
ping-pong feature.

This a valid use case and concern. We had a discussion with Bjorn, and
want to take this scenario into consideration of the separate robustness
improvement series[1].
Stephan could you agree to have the basic function in this series can be
go in firstly.

[1]
https://lore.kernel.org/all/[email protected]/


You can make it a bit more reliable if you also check the status of the
"stop-ack" bit. This would tell you if the remoteproc was cleanly
stopped with the SMP2P "stop" mechanism. However, that will typically
still not fix the case above since nowadays remoteprocs are typically
stopped via the QMI qcom_sysmon and the "stop-ack" is not set in that
case. I believe this might set the separate "shutdown-ack" bit though
that is described for some SoCs, I never finished testing that.

And even if you check both "stop-ack" and "shutdown-ack", that doesn't
tell you if the remoteproc was forcibly killed using
qcom_scm_pas_shutdown() without gracefully stopping it first. The ideal
solution would be querying the PAS API to tell us if the remoteproc is
actively running, but the last time I checked I was unfortunately not
able to find a documented call that would tell us that.

It is a state currently kernel don't know whether the remoteproc is
offline or crashed when ready==1 && error==0 && ping-pong==0 scenario.
If it is re-modprob, the software don't have any data and only the
firmware can tell us whether if it is active or not per my understanding.

Maybe let's have this scenario and solution discussion in the other
series I mentioned before.


If you add a new feature upstream, you must make sure that it is
reasonably robust and reliable. The other series is about generic
limitations in the remoteproc subsystem, so I don't think you should
move QC-specific parts over there as well (personally, I would have
probably kept all of it in one series to make it easier to understand,
but that's subjective).

With the current firmware design, it's hard - probably impossible - to
make the status detection perfectably reliable. I would therefore choose
some reasonable compromise to start with. Given that Shawn (and actually
me as well) would like to have attach working without firmware support
for the ping-pong functionality, I think it would be reasonable to start
with the basic detection scheme discussed above, i.e.

   ready==1 && handover==1 && fatal==0 && stop-ack==0 && shutdown-ack==0


Hi Stephan,

We did local test, checking stop-ack==0 && shutdown-ack==0 should be able to
cover graceful shutdown cases.

Would it be redundant to additionally check the handover state here? In our
observations, the ready and handover bits are usually set together. Meanwhile,
handover irq is not a necessary condition for pas start.

Thanks,
Jingyi


The ping-pong functionality could be added later for platforms that
support it. It would be good to have the interrupts already defined in
the device tree, so you can tweak the driver without making DT changes
later.

Thanks,
Stephan


Reply via email to