On 09/02/2026 15:35, Boris Brezillon wrote:
> On Mon, 9 Feb 2026 15:22:09 +0000
> Liviu Dudau <[email protected]> wrote:
> 
>>>> Ultimately the role of this RFC is to start a discussion and to figure out 
>>>> a path
>>>> forward for CSF GPUs where we want now to tighen a bit the formats we 
>>>> support and
>>>> add PBHA and in the future we want to add support for v15+ page formats.  
>>>
>>> PBHA is definitely an area for discussion. AIUI there are out-of-tree
>>> patches floating about for CPU support, but it hasn't been upstreamed. I
>>> don't know if any serious attempt has been made to push it upstream, but
>>> it's tricky because the architecture basically just says "IMPLEMENTATION
>>> DEFINED" which means you are no longer coding to the architecture but a
>>> specific implementation - and there's remarkably little documentation
>>> about what PBHA is used for in practice.
>>>
>>> I haven't looked into the GPU situation with PBHA - again it would be
>>> good to have more details on how the bits would be set.  
>>
>> I have a patch series that adds support in Panthor to apply some PBHA bits 
>> defined
>> in the DT based on an ID also defined in the DT and passed along as a 
>> VM_BIND parameter
>> if you want to play with it. However I have no direct knowledge on which 
>> PBHA values
>> would make a difference on the supported platforms (RK3xxx for example).

So we need something better than a DT entry saying e.g. "ID 3 is bit
pattern 0100". We need something that describes the actual behaviour of
a PBHA value. Otherwise user space will end up needing to know the exact
hardware platform it's running on to know what ID values mean.

> I don't know if that's what it's going be used for, but one very
> specific use case I'd like to see this PBHA extension backed by is
> "read-zero/write-discard" behavior that's needed for sparse bindings.
> Unfortunately, I've not heard on any HW-support for that in older
> gens...

*This* is a good example of something useful that could be exposed. If
the DT can describe that the hardware supports a
"read-zero/write-discard" with a specific bit pattern, then we can
advertise that to user space and provide a flag for VM_BIND which gives
that behaviour. And user space can make good use of it.

But from what I've heard the implementations tend to have something more
like a hint-mechanism where it affects the behaviour of the caches but
not the functional effect. This makes it much harder to expose to user
space in a meaningful way because it's highly platform dependant what
"don't allocate in the system level cache" actually means in terms of
performance effects. But it's possible we could describe more of a usage
based flag - i.e. "PBHA bits good for a tiler heap".

Thanks,
Steve

Reply via email to