On 10/29/2025 6:03 PM, Stephan Gerhold wrote:
> On Wed, Oct 29, 2025 at 01:05:42AM -0700, Jingyi Wang wrote:
>> From: Gokul krishna Krishnakumar <[email protected]>
>>
>> Subsystems can be brought out of reset by entities such as
>> bootloaders. Before attaching such subsystems, it is important to
>> check the state of the subsystem. This patch adds support to attach
>> to a subsystem by ensuring that the subsystem is in a sane state by
>> reading SMP2P bits and pinging the subsystem.
>>
>> Signed-off-by: Gokul krishna Krishnakumar <[email protected]>
>> Co-developed-by: Jingyi Wang <[email protected]>
>> Signed-off-by: Jingyi Wang <[email protected]>
>> ---
>>  drivers/remoteproc/qcom_q6v5.c      | 89 
>> ++++++++++++++++++++++++++++++++++++-
>>  drivers/remoteproc/qcom_q6v5.h      | 14 +++++-
>>  drivers/remoteproc/qcom_q6v5_adsp.c |  2 +-
>>  drivers/remoteproc/qcom_q6v5_mss.c  |  2 +-
>>  drivers/remoteproc/qcom_q6v5_pas.c  | 63 +++++++++++++++++++++++++-
>>  5 files changed, 165 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/remoteproc/qcom_q6v5.c b/drivers/remoteproc/qcom_q6v5.c
>> index 58d5b85e58cd..4ce9e43fc5c7 100644
>> --- a/drivers/remoteproc/qcom_q6v5.c
>> +++ b/drivers/remoteproc/qcom_q6v5.c
>> [...]
>> @@ -234,6 +246,77 @@ unsigned long qcom_q6v5_panic(struct qcom_q6v5 *q6v5)
>>  }
>>  EXPORT_SYMBOL_GPL(qcom_q6v5_panic);
>>  
>> +static irqreturn_t q6v5_pong_interrupt(int irq, void *data)
>> +{
>> +    struct qcom_q6v5 *q6v5 = data;
>> +
>> +    complete(&q6v5->ping_done);
>> +
>> +    return IRQ_HANDLED;
>> +}
>> +
>> +int qcom_q6v5_ping_subsystem(struct qcom_q6v5 *q6v5)
>> +{
>> +    int ret;
>> +    int ping_failed = 0;
>> +
>> +    reinit_completion(&q6v5->ping_done);
>> +
>> +    /* Set master kernel Ping bit */
>> +    ret = qcom_smem_state_update_bits(q6v5->ping_state,
>> +                                      BIT(q6v5->ping_bit), 
>> BIT(q6v5->ping_bit));
>> +    if (ret) {
>> +            dev_err(q6v5->dev, "Failed to update ping bits\n");
>> +            return ret;
>> +    }
>> +
>> +    ret = wait_for_completion_timeout(&q6v5->ping_done, 
>> msecs_to_jiffies(PING_TIMEOUT));
>> +    if (!ret) {
>> +            ping_failed = -ETIMEDOUT;
>> +            dev_err(q6v5->dev, "Failed to get back pong\n");
>> +    }
>> +
>> +    /* Clear ping bit master kernel */
>> +    ret = qcom_smem_state_update_bits(q6v5->ping_state, 
>> BIT(q6v5->ping_bit), 0);
>> +    if (ret) {
>> +            pr_err("Failed to clear master kernel bits\n");
> 
> dev_err()?
> 
>> +            return ret;
>> +    }
>> +
>> +    if (ping_failed)
>> +            return ping_failed;
> 
> Could just "return ping_failed;" directly.
> 
>> +
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(qcom_q6v5_ping_subsystem);
>> +
>> +int qcom_q6v5_ping_subsystem_init(struct qcom_q6v5 *q6v5, struct 
>> platform_device *pdev)
>> +{
>> +    int ret = -ENODEV;
>> +
>> +    q6v5->ping_state = devm_qcom_smem_state_get(&pdev->dev, "ping", 
>> &q6v5->ping_bit);
>> +    if (IS_ERR(q6v5->ping_state)) {
>> +            dev_err(&pdev->dev, "failed to acquire smem state %ld\n",
>> +                    PTR_ERR(q6v5->ping_state));
>> +            return ret;
> 
> return PTR_ERR(q6v5->ping_state)?
> 
>> +    }
>> +
>> +    q6v5->pong_irq = platform_get_irq_byname(pdev, "pong");
>> +    if (q6v5->pong_irq < 0)
>> +            return q6v5->pong_irq;
>> +
>> +    ret = devm_request_threaded_irq(&pdev->dev, q6v5->pong_irq, NULL,
>> +                                    q6v5_pong_interrupt, 
>> IRQF_TRIGGER_RISING | IRQF_ONESHOT,
>> +                                    "q6v5 pong", q6v5);
>> +    if (ret)
>> +            dev_err(&pdev->dev, "failed to acquire pong IRQ\n");
>> +
>> +    init_completion(&q6v5->ping_done);
> 
> It would be better to have init_completion() before requesting the
> interrupt, to guarantee that complete(&q6v5->ping_done); cannot happen
> before the completion struct is initialized.
> 
>> +
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(qcom_q6v5_ping_subsystem_init);
>> +
>>  /**
>>   * qcom_q6v5_init() - initializer of the q6v5 common struct
>>   * @q6v5:   handle to be initialized
>> @@ -247,7 +330,7 @@ EXPORT_SYMBOL_GPL(qcom_q6v5_panic);
>>   */
>>  int qcom_q6v5_init(struct qcom_q6v5 *q6v5, struct platform_device *pdev,
>>                 struct rproc *rproc, int crash_reason, const char 
>> *load_state,
>> -               void (*handover)(struct qcom_q6v5 *q6v5))
>> +               bool early_boot, void (*handover)(struct qcom_q6v5 *q6v5))
>>  {
>>      int ret;
>>  
>> @@ -255,10 +338,14 @@ int qcom_q6v5_init(struct qcom_q6v5 *q6v5, struct 
>> platform_device *pdev,
>>      q6v5->dev = &pdev->dev;
>>      q6v5->crash_reason = crash_reason;
>>      q6v5->handover = handover;
>> +    q6v5->early_boot = early_boot;
>>  
>>      init_completion(&q6v5->start_done);
>>      init_completion(&q6v5->stop_done);
>>  
>> +    if (early_boot)
>> +            init_completion(&q6v5->subsys_booted);
>> +
>>      q6v5->wdog_irq = platform_get_irq_byname(pdev, "wdog");
>>      if (q6v5->wdog_irq < 0)
>>              return q6v5->wdog_irq;
>> diff --git a/drivers/remoteproc/qcom_q6v5.h b/drivers/remoteproc/qcom_q6v5.h
>> index 5a859c41896e..8a227bf70d7e 100644
>> --- a/drivers/remoteproc/qcom_q6v5.h
>> +++ b/drivers/remoteproc/qcom_q6v5.h
>> @@ -12,27 +12,35 @@ struct rproc;
>>  struct qcom_smem_state;
>>  struct qcom_sysmon;
>>  
>> +#define PING_TIMEOUT 500 /* in milliseconds */
>> +#define PING_TEST_WAIT 500 /* in milliseconds */
> 
> Why is this defined in the shared header rather than the C file that
> uses this?
> 
> PING_TEST_WAIT looks unused.
> 
>> +
>>  struct qcom_q6v5 {
>>      struct device *dev;
>>      struct rproc *rproc;
>>  
>>      struct qcom_smem_state *state;
>> +    struct qcom_smem_state *ping_state;
>>      struct qmp *qmp;
>>  
>>      struct icc_path *path;
>>  
>>      unsigned stop_bit;
>> +    unsigned int ping_bit;
>>  
>>      int wdog_irq;
>>      int fatal_irq;
>>      int ready_irq;
>>      int handover_irq;
>>      int stop_irq;
>> +    int pong_irq;
>>  
>>      bool handover_issued;
>>  
>>      struct completion start_done;
>>      struct completion stop_done;
>> +    struct completion subsys_booted;
>> +    struct completion ping_done;
>>  
>>      int crash_reason;
>>  
>> @@ -40,11 +48,13 @@ struct qcom_q6v5 {
>>  
>>      const char *load_state;
>>      void (*handover)(struct qcom_q6v5 *q6v5);
>> +
>> +    bool early_boot;
>>  };
>>  
>>  int qcom_q6v5_init(struct qcom_q6v5 *q6v5, struct platform_device *pdev,
>>                 struct rproc *rproc, int crash_reason, const char 
>> *load_state,
>> -               void (*handover)(struct qcom_q6v5 *q6v5));
>> +               bool early_boot, void (*handover)(struct qcom_q6v5 *q6v5));
>>  void qcom_q6v5_deinit(struct qcom_q6v5 *q6v5);
>>  
>>  int qcom_q6v5_prepare(struct qcom_q6v5 *q6v5);
>> @@ -52,5 +62,7 @@ int qcom_q6v5_unprepare(struct qcom_q6v5 *q6v5);
>>  int qcom_q6v5_request_stop(struct qcom_q6v5 *q6v5, struct qcom_sysmon 
>> *sysmon);
>>  int qcom_q6v5_wait_for_start(struct qcom_q6v5 *q6v5, int timeout);
>>  unsigned long qcom_q6v5_panic(struct qcom_q6v5 *q6v5);
>> +int qcom_q6v5_ping_subsystem(struct qcom_q6v5 *q6v5);
>> +int qcom_q6v5_ping_subsystem_init(struct qcom_q6v5 *q6v5, struct 
>> platform_device *pdev);
>>  
>>  #endif
>> [...]
>> diff --git a/drivers/remoteproc/qcom_q6v5_pas.c 
>> b/drivers/remoteproc/qcom_q6v5_pas.c
>> index 158bcd6cc85c..b667c11aadb5 100644
>> --- a/drivers/remoteproc/qcom_q6v5_pas.c
>> +++ b/drivers/remoteproc/qcom_q6v5_pas.c
>> @@ -35,6 +35,8 @@
>>  
>>  #define MAX_ASSIGN_COUNT 3
>>  
>> +#define EARLY_BOOT_RETRY_INTERVAL_MS 5000
>> +
>>  struct qcom_pas_data {
>>      int crash_reason_smem;
>>      const char *firmware_name;
>> @@ -59,6 +61,7 @@ struct qcom_pas_data {
>>      int region_assign_count;
>>      bool region_assign_shared;
>>      int region_assign_vmid;
>> +    bool early_boot;
>>  };
>>  
>>  struct qcom_pas {
>> @@ -409,6 +412,8 @@ static int qcom_pas_stop(struct rproc *rproc)
>>      if (pas->smem_host_id)
>>              ret = qcom_smem_bust_hwspin_lock_by_host(pas->smem_host_id);
>>  
>> +    pas->q6v5.early_boot = false;
>> +
>>      return ret;
>>  }
>>  
>> @@ -434,6 +439,51 @@ static unsigned long qcom_pas_panic(struct rproc *rproc)
>>      return qcom_q6v5_panic(&pas->q6v5);
>>  }
>>  
>> +static int qcom_pas_attach(struct rproc *rproc)
>> +{
>> +    int ret;
>> +    struct qcom_pas *adsp = rproc->priv;
>> +    bool ready_state;
>> +    bool crash_state;
>> +
>> +    if (!adsp->q6v5.early_boot)
>> +            return -EINVAL;
>> +
>> +    ret = irq_get_irqchip_state(adsp->q6v5.fatal_irq,
>> +                                IRQCHIP_STATE_LINE_LEVEL, &crash_state);
>> +
>> +    if (crash_state) {
> 
> crash_state will be uninitialized if irq_get_irqchip_state() returns an
> error.

Good catch.
Suggest to check ret result. If don't have fatal_irq available, then
just return fail on the attach and don't need to try crash reporting.

> 
>> +            dev_err(adsp->dev, "Sub system has crashed before driver 
>> probe\n");
>> +            adsp->rproc->state = RPROC_CRASHED;
>> +            return -EINVAL;
> 
> Ok, so the subsystem has crashed. Now what? We probably want to restart
> it, but I don't think anyone will handle the RPROC_CRASHED state you are
> setting here.

Agree. RPROC_CRASHED needed to be left for crash handler to set.

> 
> I think it would make more sense to call rproc_report_crash() here. This
> will set RPROC_CRASHED for you and trigger recovery. I'm not sure if
> this works properly in RPROC_DETACHED state, please test to make sure
> that works as intended.

Agree.
Suggest to have:
q6v5->running = false;
rproc_report_crash(q6v5->rproc, RPROC_FATAL_ERROR);

Test to be performed like:
Explicitly hack to always comes to crash_state here to see if it is good
to perform the crash recovery.

> 
>> +    }
>> +
>> +    ret = irq_get_irqchip_state(adsp->q6v5.ready_irq,
>> +                                IRQCHIP_STATE_LINE_LEVEL, &ready_state);
>> +
>> +    if (ready_state) {
>> +            dev_info(adsp->dev, "Sub system has boot-up before driver 
>> probe\n");
> 
> This message feels redundant, dmesg already shows a different message
> for "attaching" vs "booting" a remoteproc.
> 
>> +            adsp->rproc->state = RPROC_DETACHED;
> 
> What is the point of this assignment? You have already set this state
> inside qcom_pas_probe().

Make sense.

> 
>> +    } else {
>> +            ret = wait_for_completion_timeout(&adsp->q6v5.subsys_booted,
>> +                                              
>> msecs_to_jiffies(EARLY_BOOT_RETRY_INTERVAL_MS));
>> +            if (!ret) {
>> +                    dev_err(adsp->dev, "Timeout on waiting for subsystem 
>> interrupt\n");
>> +                    return -ETIMEDOUT;
>> +            }
> 
> This looks like you want to handle the case where the remoteproc is
> still booting while this code is running (i.e. it has not finished
> booting yet by signaling the ready state). Is this situation actually
> possible with the current firmware design?

This shouldn't happen during the initial boot stage, as far as I understand.

The current remoteproc is required by the bootloader/firmware before the
kernel even starts, so it shouldn't be in a state where it's still
booting at that point. If it were, the early_boot feature wouldn't be
necessary at all.

However, if the remoteproc is designed like in a second attempt to
attach—especially when RPROC_FEAT_ATTACH_ON_RECOVERY is enabled—then
it's possible this could occur(remoteproc is auto booting while kernel
is trying to attach with ready state check) as a corner case during boot-up.

> 
> I don't see how this would reliably work in practice. If firmware boots
> a remoteproc early it should wait until:
> 
>  - Handover is signaled, to ensure the proxy votes are kept
>  - Ready is signaled, to ensure the metadata region remains reserved
> 
> None of this is guaranteed if the firmware gives up control to Linux
> before waiting for the signals.
> 
> I would suggest dropping all the code related to handling the late
> "subsys_booted" completion. If this is needed, can you explain when
> exactly this situation happens and how you guarantee reliable startup of
> the remoteproc?


For the kaanapali specific remoteproc(soccp) with early-boot feature
here, it is rely on the rproc_shutdown/boot to recovery. And it should
be very corner case like bootloader/firmware bug to have such kind of
not ready state. Maybe we can simple remove the "subsys_booted"
mechanism, and only do a rproc_report_crash in this corner case.

> 
>> +    }
>> +
>> +    ret = qcom_q6v5_ping_subsystem(&adsp->q6v5);
>> +    if (ret) {
>> +            dev_err(adsp->dev, "Failed to ping subsystem, assuming device 
>> crashed\n");
>> +            rproc->state = RPROC_CRASHED;
>> +            return ret;
>> +    }
>> +
>> +    adsp->q6v5.running = true;
> 
> You should probably also set q6v5->handover_issued = true;, otherwise
> qcom_pas_stop() will later drop all the handover votes that you have
> never made. This will break all the reference counting.

Acked for all above comments you described and well understood.

> 
> Overall, this patch feels quite fragile in the current state. Please
> make sure you carefully consider all side effects and new edge cases
> introduced by your changes.

While for other edge cases and side effects, maybe Stephan can help on
have more details.

> 
> Thanks,
> Stephan


-- 
Thx and BRs,
Aiqun(Maria) Yu

Reply via email to