On Mon Jun 1, 2026 at 3:41 AM JST, Timur Tabi wrote: > On Sun, 2026-05-31 at 21:37 +0900, Alexandre Courbot wrote: >> On Turing and Ampere, resetting the GSP involves running two firmware >> images: FWSEC-SB and Booter Unloader. They are independent from one >> another, and we should do whatever is possible to restore the GSP's >> unloaded state even if a failure occurs along the way. >> >> Thus, keep going and run Booter Unloader even if the execution of >> FWSEC-SB failed. >> >> Reported-by: Sashiko <[email protected]> >> Closes: >> https://sashiko.dev/#/patchset/20260529-nova-unload-v7-0-678f39209e00%40nvidia.com?part=3 >> Fixes: adb99ce3cc78 ("gpu: nova-core: run Booter Unloader and FWSEC-SB upon >> unbinding") >> Signed-off-by: Alexandre Courbot <[email protected]> >> --- >> This was caught by Sashiko; I unfortunately noticed it after pushing the >> series, but having it as a follow-up is beneficial regardless as it >> allows more time for review. >> --- >> drivers/gpu/nova-core/gsp/hal/tu102.rs | 18 ++++++++++++++---- >> 1 file changed, 14 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs >> b/drivers/gpu/nova-core/gsp/hal/tu102.rs >> index a033bc892066..b10215190257 100644 >> --- a/drivers/gpu/nova-core/gsp/hal/tu102.rs >> +++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs >> @@ -134,11 +134,19 @@ fn run( >> sec2_falcon: &Falcon<Sec2>, >> ) -> Result { >> // Run FWSEC-SB to reset the GSP falcon to its pre-libos state. >> - self.fwsec_sb.run(dev, bar, gsp_falcon)?; >> + // Log errors but keep going if it fails. >> + let fwsec_sb_res = self >> + .fwsec_sb >> + .run(dev, bar, gsp_falcon) >> + .inspect_err(|e| dev_err!(dev, "FWSEC-SB failed to run: >> {:?}\n", e)); > > Shouldn't this be dev_warn?
I guess that's subjective, but since it is technically an error that is likely to prevent the driver the reload I think `dev_err` is the right level here. > > Also, how did you test this? Have you tried breaking the FWSEC-SB code and > telling > booter_unload run anyway, and seeing if you can still reload the driver? > Sashiko said this: > >> Since FWSEC-SB (running on gsp_falcon) and the Booter Unloader (running > on sec2_falcon) are independent cleanup steps, returning early here bypasses > the Booter Unloader execution entirely. > > Are we sure they really are independent? What does RM do? This is not really about being able to reload the driver afterwards if FWSEC-SB fails - in all likelihood we won't be able to, although that ultimately depends on where and how hard FWSEC-SB fails. The reason for doing this is more to stay consistent with our teardown policy [1], which is to keep going even if one step fails. [1] https://lore.kernel.org/all/[email protected]/
