A new Gallium HUD "value producer" could be added that reads fdinfo without
calling the driver. I still think there is merit in having this in
amdgpu_drm.h too.

Marek

On Tue, Jan 24, 2023 at 3:13 AM Marek Olšák <mar...@gmail.com> wrote:

> The table of exposed driver-specific counters:
>
> https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/drivers/radeonsi/si_query.c#L1751
>
> Counter enums. They use the same interface as e.g. occlusion queries,
> except that begin_query and end_query save the results in the driver/CPU.
>
> https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/drivers/radeonsi/si_query.h#L45
>
> Counters exposed by the winsys:
>
> https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/include/winsys/radeon_winsys.h#L126
>
> I just need to query the counters in the winsys and return them.
>
> Marek
>
> On Tue, Jan 24, 2023 at 2:58 AM Christian König <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
>> How are the counters which the HUD consumes declared?
>>
>> See what I want to avoid is a) to nail down the interface with the kernel
>> on specific values and b) make it possible to easily expose new values.
>>
>> In other words what we could do with fdinfo is to have something like
>> this:
>>
>> GALLIUM_FDINFO_HUD=drm-memory-vram,amd-evicted-vram,amd-mclk glxgears
>>
>> And the HUD just displays the values the kernel provides without the need
>> to re-compile mesa when we want to add some more values nor have the values
>> as part of the UAPI.
>>
>> Christian.
>>
>> Am 24.01.23 um 08:37 schrieb Marek Olšák:
>>
>> The Gallium HUD doesn't consume strings. It only consumes values that are
>> exposed as counters from the driver. In this case, we need the driver to
>> expose evicted stats as counters. Each counter can set whether the value is
>> absolute (e.g. memory usage) or monotonic (e.g. perf counter). Parsing
>> fdinfo to get the values is undesirable.
>>
>> Marek
>>
>> On Mon, Jan 23, 2023 at 4:31 AM Christian König <
>> ckoenig.leichtzumer...@gmail.com> wrote:
>>
>>> Let's do this as valid in fdinfo.
>>>
>>> This way we can easily extend whatever the kernel wants to display as
>>> statistics in the userspace HUD.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 21.01.23 um 01:45 schrieb Marek Olšák:
>>>
>>> We badly need a way to query evicted memory usage. It's essential for
>>> investigating performance problems and it uncovered the buddy allocator
>>> disaster. Please either suggest an alternative, suggest changes, or review.
>>> We need it ASAP.
>>>
>>> Thanks,
>>> Marek
>>>
>>> On Tue, Jan 10, 2023 at 11:55 AM Marek Olšák <mar...@gmail.com> wrote:
>>>
>>>> On Tue, Jan 10, 2023 at 11:23 AM Christian König <
>>>> ckoenig.leichtzumer...@gmail.com> wrote:
>>>>
>>>>> Am 10.01.23 um 16:28 schrieb Marek Olšák:
>>>>>
>>>>> On Wed, Jan 4, 2023 at 9:51 AM Christian König <
>>>>> ckoenig.leichtzumer...@gmail.com> wrote:
>>>>>
>>>>>> Am 04.01.23 um 00:08 schrieb Marek Olšák:
>>>>>>
>>>>>> I see about the access now, but did you even look at the patch?
>>>>>>
>>>>>>
>>>>>> I did look at the patch, but I haven't fully understood yet what you
>>>>>> are trying to do here.
>>>>>>
>>>>>
>>>>> First and foremost, it returns the evicted size of VRAM and visible
>>>>> VRAM, and returns visible VRAM usage. It should be obvious which stat
>>>>> includes the size of another.
>>>>>
>>>>>
>>>>>> Because what the patch does isn't even exposed to common drm code,
>>>>>> such as the preferred domain and visible VRAM placement, so it can't be 
>>>>>> in
>>>>>> fdinfo right now.
>>>>>>
>>>>>> Or do you even know what fdinfo contains? Because it contains nothing
>>>>>> useful. It only has VRAM and GTT usage, which we already have in the INFO
>>>>>> ioctl, so it has nothing that we need. We mainly need the eviction
>>>>>> information and visible VRAM information now. Everything else is a bonus.
>>>>>>
>>>>>>
>>>>>> Well the main question is what are you trying to get from that
>>>>>> information? The eviction list for example is completely meaningless to
>>>>>> userspace, that stuff is only temporary and will be cleared on the next 
>>>>>> CS
>>>>>> again.
>>>>>>
>>>>>
>>>>> I don't know what you mean. The returned eviction stats look correct
>>>>> and are stable (they don't change much). You can suggest changes if you
>>>>> think some numbers are not reported correctly.
>>>>>
>>>>>
>>>>>>
>>>>>> What we could expose is the VRAM over-commit value, e.g. how much BOs
>>>>>> which where supposed to be in VRAM are in GTT now. I think that's what 
>>>>>> you
>>>>>> are looking for here, right?
>>>>>>
>>>>>
>>>>> The VRAM overcommit value is "evicted_vram".
>>>>>
>>>>>
>>>>>>
>>>>>> Also, it's undesirable to open and parse a text file if we can just
>>>>>> call an ioctl.
>>>>>>
>>>>>>
>>>>>> Well I see the reasoning for that, but I also see why other drivers
>>>>>> do a lot of the stuff we have as IOCTL as separate files in sysfs, fdinfo
>>>>>> or debugfs.
>>>>>>
>>>>>> Especially repeating all the static information which were already
>>>>>> available under sysfs in the INFO IOCTL was a design mistake as far as I
>>>>>> can see. Just compare what AMDGPU and the KFD code is doing to what for
>>>>>> example i915 is doing.
>>>>>>
>>>>>> Same for things like debug information about a process. The fdinfo
>>>>>> stuff can be queried from external tools (gdb, gputop, umr etc...) as 
>>>>>> well
>>>>>> which makes that interface more preferred.
>>>>>>
>>>>>
>>>>> Nothing uses fdinfo in Mesa. No driver uses sysfs in Mesa except drm
>>>>> shims, noop drivers, and Intel for perf metrics. sysfs itself is an
>>>>> unusable mess for the PCIe query and is missing information.
>>>>>
>>>>> I'm not against exposing more stuff through sysfs and fdinfo for
>>>>> tools, but I don't see any reason why drivers should use it (other than 
>>>>> for
>>>>> slowing down queries and initialization).
>>>>>
>>>>>
>>>>> That's what I'm asking: Is this for some tool or to make some driver
>>>>> decision based on it?
>>>>>
>>>>> If you just want the numbers for over displaying then I think it would
>>>>> be better to put this into fdinfo together with the other existing stuff
>>>>> there.
>>>>>
>>>>
>>>>> If you want to make allocation decisions based on this then we should
>>>>> have that as IOCTL or even better as mmap() page between kernel and
>>>>> userspace. But in this case I would also calculation the numbers 
>>>>> completely
>>>>> different as well.
>>>>>
>>>>> See we have at least the following things in the kernel:
>>>>> 1. The eviction list in the VM.
>>>>>     Those are the BOs which are currently evicted and tried to moved
>>>>> back in on the next CS.
>>>>>
>>>>> 2. The VRAM over commit value.
>>>>>     In other words how much more VRAM than available has the
>>>>> application tried to allocate?
>>>>>
>>>>> 3. The visible VRAM usage by this application.
>>>>>
>>>>> The end goal is that the eviction list will go away, e.g. we will
>>>>> always have stable allocations based on allocations of other applications
>>>>> and not constantly swap things in and out.
>>>>>
>>>>> When you now expose the eviction list to userspace we will be stuck
>>>>> with this interface forever.
>>>>>
>>>>
>>>> It's for the GALLIUM HUD.
>>>>
>>>> The only missing thing is the size of all evicted VRAM allocations, and
>>>> the size of all evicted visible VRAM allocations.
>>>>
>>>> 1. No list is exposed. Only sums of buffer sizes are exposed. Also, the
>>>> eviction list has no meaning here. All lists are treated equally, and
>>>> mem_type is compared with preferred_domains to determine where buffers are
>>>> and where they should be.
>>>>
>>>> 2. I'm not interested in the overcommit value. I'm only interested in
>>>> knowing the number of bytes of evicted VRAM right now. It can be as
>>>> variable as the CPU load, but in practice it shouldn't be because PCIe
>>>> doesn't have the bandwidth to move things quickly.
>>>>
>>>> 3. Yes, that's true.
>>>>
>>>> Marek
>>>>
>>>>
>>>
>>

Reply via email to