On Mon, Mar 02, 2026 at 08:58:23PM +0900, Alexandre Courbot wrote:
> > +//! controlled by the `NV_PBUS_BAR0_WINDOW` register and must be 64KB
> > aligned.
>
> s/must be/is - it's not like that hardware is giving us a choice anyway
> since we cannot even express a non-aligned value in the window register.
Done.
> > + let bar = self.bar.try_access().ok_or(ENODEV)?;
>
> Ouch, we are calling `try_access` for every single read or write
> operation.
> > + bar: Arc<Devres<Bar0>>,
>
> Cloning the `Arc` is what forces us to do a `try_access` on every
> read/write operation. [...] you can just call `try_access` in `window`
> and store that. This turns `bar` into:
>
> bar: RevocableGuard<'a, Bar0>,
Agreed, this is a lot better and has fewer lines of code too. I have
made the change.
> > +struct PraminState {
> > + current_base: usize,
> > +}
>
> This has only one member and no impl block of its own - shall we inline
> it?
Agreed. Inlined to Mutex<usize>.
> > + /// This lock is acquired during the DMA fence signalling critical
> > path.
>
> nit: s/signalling/signaling
Done.
> > + pub(crate) fn window(&self) -> Result<PraminWindow<'_>> {
>
> The name of this method is strange - we don't pass the base of any
> window area. It looks more like it is actually acquiring the `Pramin`
> for exclusive access.
How about `get_window()`? Open to other suggestions too.
> Also fun question: what happens if we try to access an area above (or
> below) the available VRAM?
Good catch, I added further hardening to check for that in next spin. And
as a result, I can also drop this check now "Validate within hardware
addressable range (40-bit address space)" since it will be covered by the
VRAM region check.
Also I was doing unnecessary window repositioning: we were computing a new
base for every 64KB boundary crossing, fixed now.
> > + fn try_read_window_base(bar: &Bar0) -> Result<usize> {
> > + let shifted = base.checked_shl(16).ok_or(EOVERFLOW)?;
> > + shifted.try_into().map_err(|_| EOVERFLOW)
>
> This function is actually infallible. [...] For now can you use a regular
> shift operation with a TODO to convert to `Bounded`?
Done. Renamed to read_window_base(), uses plain shift and u64_as_usize.
Added TODO for Bounded.
> > + saved_base: usize,
>
> What is the point of saving and restoring the original position?
> > impl Drop for PraminWindow<'_> { ...
>
> Let's drop this alongside the original window base restoration, which
> doesn't serve any purpose.
I was trying to avoid side effects for other users of PRAMIN, but you
are right, that is too defensive with handling this. Will simplify.
> > + state: MutexGuard<'a, PraminState>,
>
> I'm wondering whether we should remove the `Mutex` from `Pramin` and
> make its methods take `&mut self`.
Actually I would like not to do that because it complicates the design -
`&mut self` on Pramin then requires `&mut GpuMm` everywhere, cascading
through all callers. I had tried that path initially but ended up with
the internal Mutex approach and concluded it is cleaner. I think we can
start with this and refine as needed - let me know if you're okay with
that. Similar reasoning for the buddy allocator, where I implemented
internal locking there after trying the mutable reference path.
> > + fn write_window_base(bar: &Bar0, base: usize) {
>
> `base` should be the native type of the GPU, i.e. `u64`.
Makes sense. I changed it throughout the file to be that way. Btw, I left
the BAR0 offsets (bar_offset, offset_in_window) remain usize since those
are host MMIO offsets. We could argue that is 'an offset into VRAM though
which is GPU addresses' but I guess we have to draw the line somewhere :)
> > + // CAST:
> > + // - We have guaranteed that the base is within the addressable
> > range (40-bits).
>
> This function does not guarantee anything of the sort - even the caller
> doesn't, this actually relies on the behavior of `compute_window`.
Fixed. Reworded to: "The caller (compute_window) validates that base is
within the VRAM region which is always <= 40 bits. After >> 16, a 40-bit
base becomes 24 bits, which fits in u32."
> > + if end_offset > MAX_VRAM_OFFSET + 1 {
>
> If we are going to use `MAX_VRAM_OFFSET` like this, let's define it as
> `1 << 40` and turn this into `if end_offset > MAX_VRAM_OFFSET`.
Removed entirely - the 40-bit hardware check is now redundant since
compute_window validates against the actual vram_region (which is
always <= 40 bits) as per above.
> > + if offset_in_window + access_size > PRAMIN_SIZE {
>
> Here `offset_in_window` cannot be larger than 64K because of the line
> before - this check is effectively dead code.
This was true with the old code that aligned to 64KB on every access.
With the repositioning fix (we now check if the access fits in the
current 1MB window first), offset_in_window can be up to ~1MB, so this
check is now live and needed.
> > + 25:24 target as u8, "Target memory (0=VRAM, 1=SYS_MEM_COH,
> > 2=SYS_MEM_NONCOH)";
>
> This should be converted to an enum. [...] a single-variant enum will
> document that fact and force us to set the correct value.
Done. Added Bar0WindowTarget enum with a single variant.
thanks,
--
Joel Fernandes