On Wed Sep 3, 2025 at 8:05 PM JST, Danilo Krummrich wrote:
> On Wed Sep 3, 2025 at 12:44 PM CEST, Alexandre Courbot wrote:
>> On Wed Sep 3, 2025 at 5:26 PM JST, Danilo Krummrich wrote:
>>> On Wed Sep 3, 2025 at 9:08 AM CEST, Alexandre Courbot wrote:
>>>> On Wed Sep 3, 2025 at 4:53 AM JST, Danilo Krummrich wrote:
>>>>> On Tue Sep 2, 2025 at 4:31 PM CEST, Alexandre Courbot wrote:
>>>>>> diff --git a/drivers/gpu/nova-core/driver.rs 
>>>>>> b/drivers/gpu/nova-core/driver.rs
>>>>>> index 
>>>>>> 274989ea1fb4a5e3e6678a08920ddc76d2809ab2..1062014c0a488e959379f009c2e8029ffaa1e2f8
>>>>>>  100644
>>>>>> --- a/drivers/gpu/nova-core/driver.rs
>>>>>> +++ b/drivers/gpu/nova-core/driver.rs
>>>>>> @@ -6,6 +6,8 @@
>>>>>>  
>>>>>>  #[pin_data]
>>>>>>  pub(crate) struct NovaCore {
>>>>>> +    // Placeholder for the real `Gsp` object once it is built.
>>>>>> +    pub(crate) gsp: (),
>>>>>>      #[pin]
>>>>>>      pub(crate) gpu: Gpu,
>>>>>>      _reg: auxiliary::Registration,
>>>>>> @@ -40,8 +42,14 @@ fn probe(pdev: &pci::Device<Core>, _info: 
>>>>>> &Self::IdInfo) -> Result<Pin<KBox<Self
>>>>>>          )?;
>>>>>>  
>>>>>>          let this = KBox::pin_init(
>>>>>> -            try_pin_init!(Self {
>>>>>> +            try_pin_init!(&this in Self {
>>>>>>                  gpu <- Gpu::new(pdev, bar)?,
>>>>>> +                gsp <- {
>>>>>> +                    // SAFETY: `this.gpu` is initialized to a valid 
>>>>>> value.
>>>>>> +                    let gpu = unsafe { &(*this.as_ptr()).gpu };
>>>>>> +
>>>>>> +                    gpu.start_gsp(pdev)?
>>>>>> +                },
>>>>>
>>>>> Please use pin_chain() [1] for this.
>>>>
>>>> Sorry, but I couldn't figure out how I can use pin_chain here (and
>>>> couldn't find any relevant example in the kernel code either). Can you
>>>> elaborate a bit?
>>>
>>> I thought of just doing the following, which I think should be equivalent 
>>> (diff
>>> against current nova-next).
>>>
>>> diff --git a/drivers/gpu/nova-core/driver.rs 
>>> b/drivers/gpu/nova-core/driver.rs
>>> index 274989ea1fb4..6d62867f7503 100644
>>> --- a/drivers/gpu/nova-core/driver.rs
>>> +++ b/drivers/gpu/nova-core/driver.rs
>>> @@ -41,7 +41,9 @@ fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) 
>>> -> Result<Pin<KBox<Self
>>>
>>>          let this = KBox::pin_init(
>>>              try_pin_init!(Self {
>>> -                gpu <- Gpu::new(pdev, bar)?,
>>> +                gpu <- Gpu::new(pdev, bar)?.pin_chain(|gpu| {
>>> +                    gpu.start_gsp(pdev)
>>> +                }),
>>>                  _reg: auxiliary::Registration::new(
>>>                      pdev.as_ref(),
>>>                      c_str!("nova-drm"),
>>> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
>>> index 8caecaf7dfb4..211bc1a5a5b3 100644
>>> --- a/drivers/gpu/nova-core/gpu.rs
>>> +++ b/drivers/gpu/nova-core/gpu.rs
>>> @@ -266,7 +266,7 @@ fn run_fwsec_frts(
>>>      pub(crate) fn new(
>>>          pdev: &pci::Device<device::Bound>,
>>>          devres_bar: Arc<Devres<Bar0>>,
>>> -    ) -> Result<impl PinInit<Self>> {
>>> +    ) -> Result<impl PinInit<Self, Error>> {
>>>          let bar = devres_bar.access(pdev.as_ref())?;
>>>          let spec = Spec::new(bar)?;
>>>          let fw = Firmware::new(pdev.as_ref(), spec.chipset, 
>>> FIRMWARE_VERSION)?;
>>> @@ -302,11 +302,16 @@ pub(crate) fn new(
>>>
>>>          Self::run_fwsec_frts(pdev.as_ref(), &gsp_falcon, bar, &bios, 
>>> &fb_layout)?;
>>>
>>> -        Ok(pin_init!(Self {
>>> +        Ok(try_pin_init!(Self {
>>>              spec,
>>>              bar: devres_bar,
>>>              fw,
>>>              sysmem_flush,
>>>          }))
>>>      }
>>> +
>>> +    pub(crate) fn start_gsp(&self, _pdev: &pci::Device<device::Core>) -> 
>>> Result {
>>> +        // noop
>>> +        Ok(())
>>> +    }
>>>  }
>>>
>>> But maybe it doesn't capture your intend?
>>
>> The issue is that `start_gsp` returns a value (currently a placeholder
>> `()`, but it will change into a real type) that needs to be stored into
>> the newly-introduced `gsp` member of `NovaCore`. I could not figure how
>> how `pin_chain` could help with this (and this is the same problem for
>> the other `unsafe` statements in `firmware/gsp.rs`).
>
> Ok, I see, I think Benno is already working on a solution to access previously
> initialized fields from subsequent initializers.
>
> @Benno: What's the status of this? I haven't seen an issue for that in the
> pin-init GitHub repo, should we create one?
>
> However, in this case I'm a bit confused why we want Gsp next to Gpu? Why not
> just make Gsp a member of Gpu then?

To be honest I am not completely sure about the best layout yet and will
need more visibility to understand whether this is optimal. But
considering that we want to run the GSP boot process over a built `Gpu`
instance, we cannot store the result of said process inside `Gpu` unless
we put it inside e.g. an `Option`. But then the variant will always be
`Some` after `probe` returns, and yet we will have to perform a match
every time we want to access it.

The current separation sounds reasonable to me for the time being, with
`Gpu` containing purely hardware resources obtained without help from
user-space, while `Gsp` is the result of running a bunch of firmwares.
An alternative design would be to store `Gpu` inside `Gsp`, but `Gsp`
inside `Gpu` is trickier due to the build order. No matter what we do,
switching the layout later should be trivial if we don't choose the
best one now.

There is also an easy workaround to the sibling initialization issue,
which is to store `Gpu` and `Gsp` behind `Pin<KBox>` - that way we can
initialize both outside `try_pin_init!`, at the cost of two more heap
allocations over the whole lifetime of the device. If we don't have a
proper solution to the problem now, this might be better than using
`unsafe` as a temporary solution.

The same workaround could also be used for to `GspFirmware` and its page
tables - since `GspFirmware` is temporary and can apparently be
discarded after the GSP is booted, this shouldn't be a big issue. This
will allow the driver to probe, and we can add TODO items to fix that
later if a solution is in sight.

>
> I thought the intent was to keep temporary values local to start_gsp() and not
> store them next to Gpu in the same allocation?

It is not visible in the current patchset, but `start_gsp` will
eventually return the runtime data of the GSP - notably its log buffers
and command queue, which are needed to operate it. All the rest (notably
the loaded firmwares) will be local to `start_gsp` and discarded upon
its return.

Reply via email to