On 12/24/2025 6:15 AM, Stephan Gerhold wrote:
> On Tue, Dec 23, 2025 at 01:13:50AM -0800, Jingyi Wang wrote:
>> From: Gokul krishna Krishnakumar <[email protected]>
>>
>> Subsystems can be brought out of reset by entities such as bootloaders.
>> As the irq enablement could be later than subsystem bring up, the state
>> of subsystem should be checked by reading SMP2P bits and performing ping
>> test.
>>
>> A new qcom_pas_attach() function is introduced. if a crash state is
>> detected for the subsystem, rproc_report_crash() is called. If the
>> subsystem is ready either at the first check or within a 5-second timeout
>> and the ping is successful, it will be marked as "attached". The ready
>> state could be set by either ready interrupt or handover interrupt.
>>
>> If "early_boot" is set by kernel but "subsys_booted" is not completed
>> within the timeout, It could be the early boot feature is not supported
>> by other entities. In this case, the state will be marked as RPROC_OFFLINE
>> so that the PAS driver can load the firmware and start the remoteproc. As
>> the running state is set once attach function is called, the watchdog or
>> fatal interrupt received can be handled correctly.
>>
>> Signed-off-by: Gokul krishna Krishnakumar 
>> <[email protected]>
>> Co-developed-by: Jingyi Wang <[email protected]>
>> Signed-off-by: Jingyi Wang <[email protected]>
>> ---
>>  drivers/remoteproc/qcom_q6v5.c      | 87 ++++++++++++++++++++++++++++++++-
>>  drivers/remoteproc/qcom_q6v5.h      | 11 ++++-
>>  drivers/remoteproc/qcom_q6v5_adsp.c |  2 +-
>>  drivers/remoteproc/qcom_q6v5_mss.c  |  2 +-
>>  drivers/remoteproc/qcom_q6v5_pas.c  | 97 
>> ++++++++++++++++++++++++++++++++++++-
>>  drivers/remoteproc/qcom_q6v5_wcss.c |  2 +-
>>  6 files changed, 195 insertions(+), 6 deletions(-)
>>
>> [...]
>> diff --git a/drivers/remoteproc/qcom_q6v5_pas.c 
>> b/drivers/remoteproc/qcom_q6v5_pas.c
>> index 52680ac99589..7e890e18dd82 100644
>> --- a/drivers/remoteproc/qcom_q6v5_pas.c
>> +++ b/drivers/remoteproc/qcom_q6v5_pas.c
>> [...]
>> @@ -434,6 +440,85 @@ static unsigned long qcom_pas_panic(struct rproc *rproc)
>>      return qcom_q6v5_panic(&pas->q6v5);
>>  }
>>  
>> +static int qcom_pas_attach(struct rproc *rproc)
>> +{
>> +    int ret;
>> +    struct qcom_pas *pas = rproc->priv;
>> +    bool ready_state;
>> +    bool crash_state;
>> +
>> +    pas->q6v5.running = true;
>> +    ret = irq_get_irqchip_state(pas->q6v5.fatal_irq,
>> +                                IRQCHIP_STATE_LINE_LEVEL, &crash_state);
>> +
>> +    if (ret)
>> +            goto disable_running;
>> +
>> +    if (crash_state) {
>> +            dev_err(pas->dev, "Sub system has crashed before driver 
>> probe\n");
>> +            rproc_report_crash(rproc, RPROC_FATAL_ERROR);
> 
> Have you tested this case? From quick review of the code in
> remoteproc_core.c I'm skeptical if this will work correctly:
> 
>  1. Remoteproc is in RPROC_DETACHED state during auto boot
>  2. qcom_pas_attach() runs and calls rproc_report_crash(), then fails so
>     RPROC_DETACHED state remains
>  3. rproc_crash_handler_work() sets RPROC_CRASHED and starts recovery
>  4. rproc_boot_recovery() calls rproc_stop()
>  5. rproc_stop() calls rproc_stop_subdevices(), but because the
>     remoteproc was never attached, the subdevices were never started.
> 
> In this situation, rproc_stop_subdevices() should not be called. I would
> expect you will need to make some minor changes to the remoteproc_core
> to support handling crashes during RPROC_DETACHED state.
> 
> I might be reading the code wrong, but please make sure that you
> simulate this case at runtime and check that it works correctly.
> 

Recheked this part, current path has some issue on recovery and subdev handling.
As in current code, rproc_report_crash is called in the ISR/callback, it may
take some effort to call it in this attach path.


>> +            ret = -EINVAL;
>> +            goto disable_running;
>> +    }
>> +
>> +    ret = irq_get_irqchip_state(pas->q6v5.ready_irq,
>> +                                IRQCHIP_STATE_LINE_LEVEL, &ready_state);
>> +
>> +    if (ret)
>> +            goto disable_running;
>> +
>> +    enable_irq(pas->q6v5.handover_irq);
>> +
>> +    if (unlikely(!ready_state)) {
>> +            /* Set a 5 seconds timeout in case the early boot is delayed */
>> +            ret = wait_for_completion_timeout(&pas->q6v5.subsys_booted,
>> +                                              
>> msecs_to_jiffies(EARLY_ATTACH_TIMEOUT_MS));
>> +
> 
> Again, have you tested this case?
> 
> As I already wrote in v2, I don't see how this case will work reliably
> in practice. How do you ensure that the handover resources will be kept
> on during the Linux boot process until the remoteproc has completed
> booting?
> 
> Also, above you enable the handover_irq. Let's assume a handover IRQ
> does come in while you are waiting here. Then q6v5_handover_interrupt()
> will call q6v5->handover(q6v5); to disable the handover resources
> (clocks, power domains), but you never enabled those. I would expect
> that you get some bad reference count warnings in the kernel log.
> 
> I would still suggest dropping this code entirely. As far as I
> understand the response from Aiqun(Maria) Yu [1], there is no real use
> case for this on current platforms. If you want to keep this, you would
> need to vote for the handover resources during probe() (and perhaps
> more, this case is quite tricky).
> 
> Please test all your changes carefully in v4.
> 

Thanks for your detailed review, the handover resources was indeed
neglected in the design, we will re-evaluate this part.

> Thanks,
> Stephan
> 
> [1]: 
> https://lore.kernel.org/linux-arm-msm/[email protected]/

Thanks,
Jingyi

Reply via email to