On Sat, 6 Dec 2025 21:26:12 -0500
Joel Fernandes <[email protected]> wrote:
> Hi Zhi,
>
> On 12/6/2025 7:42 AM, Zhi Wang wrote:
snip
>
> boot() already returns -ETIMEDOUT via
> wait_till_halted()->read_poll_timeout().
>
> The wait there is 2 seconds. I assume the scrubber would have
> completed by then.
> 1
> > +
> > + dev_dbg!(
> > + pdev.as_ref(),
> > + "SEC2 MBOX0: {:#x}, MBOX1{:#x}\n",
> > + mbox0,
> > + mbox1
> > + );
> > +
> > + if
> > !regs::NV_PGC6_BSI_SECURE_SCRATCH_15::read(bar).scrubber_completed()
> > {
> > + return Err(ETIMEDOUT);
>
> So under which situation do you get to this point
> (!scrubber_completed) ? Basically I am not sure if ETIMEDOUT is the
> right error to return here, because boot() already returns ETIMEDOUT
> by waiting for the halt.
>
> If you still want return ETIMEDOUT here, then it sounds like you're
> waiting for scrubbing beyond the waiting already done by boot(). If
> so, then shouldn't you need to use read_poll_timeout() here?
>
> perhaps something like:
>
> read_poll_timeout(
> ||
> Ok(regs::NV_PGC6_BSI_SECURE_SCRATCH_15::read(bar).scrubber_completed()),
> |val: &bool| *val, Delta::from_millis(10),
> Delta::from_secs(5),
> )?;
>
This is the identical implementation to OpenRM [1]. According to that
parts of code, I think the scrubber runs in the binary booting process.
When it signals the firmware booting successfully, the scrubbing should
be done. Let me change to another errno.
[1]https://github.com/NVIDIA/open-gpu-kernel-modules/blob/a5bfb10e75a4046c5d991c65f49b5d29151e68cf/src/nvidia/src/kernel/gpu/gsp/arch/ada/kernel_gsp_ad102.c#L49
> Thanks.
>