Mhm, that sucks. Could we have the automated builds check for paddings in the UAPi data structure?

Christian.

Am 14.01.23 um 00:33 schrieb Marek Olšák:
There is no hole on 32-bit unfortunately. It looks like the hole on 64-bit is now ABI.

I moved the field to replace _pad1. The patch is attached (with your Rb).

Marek

On Fri, Jan 13, 2023 at 4:20 PM Alex Deucher <alexdeuc...@gmail.com> wrote:

    On Fri, Jan 13, 2023 at 4:02 PM Marek Olšák <mar...@gmail.com> wrote:
    >
    > i've added the comments and indeed pahole shows the hole as
    expected.

    What about on 32-bit?

    Alex

    >
    > Marek
    >
    > On Thu, Jan 12, 2023 at 11:44 AM Alex Deucher
    <alexdeuc...@gmail.com> wrote:
    >>
    >> On Thu, Jan 12, 2023 at 6:50 AM Christian König
    >> <christian.koe...@amd.com> wrote:
    >> >
    >> > Am 11.01.23 um 21:48 schrieb Alex Deucher:
    >> > > On Wed, Jan 4, 2023 at 3:17 PM Marek Olšák
    <mar...@gmail.com> wrote:
    >> > >> Yes, it's meant to be like a spec sheet. We are not
    interested in the current bandwidth utilization.
    >> > > After chatting with Marek on IRC and thinking about this
    more, I think
    >> > > this patch is fine.  It's not really meant for bandwidth
    per se, but
    >> > > rather as a limit to determine what the driver should do in
    certain
    >> > > cases (i.e., when does it make sense to copy to vram vs
    not).  It's
    >> > > not straightforward for userspace to parse the full topology to
    >> > > determine what links may be slow.  I guess one potential
    pitfall would
    >> > > be that if you pass the device into a VM, the driver may
    report the
    >> > > wrong values.  Generally in a VM the VM doesn't get the
    full view up
    >> > > to the root port.  I don't know if the hypervisors report
    properly for
    >> > > pcie_bandwidth_available() in a VM or if it just shows the
    info about
    >> > > the endpoint in the VM.
    >> >
    >> > So this basically doesn't return the gen and lanes of the
    device, but
    >> > rather what was negotiated between the device and the
    upstream root port?
    >>
    >> Correct. It exposes the max gen and lanes of the slowest link
    between
    >> the device and the root port.
    >>
    >> >
    >> > If I got that correctly then we should probably document that
    cause
    >> > otherwise somebody will try to "fix" it at some time.
    >>
    >> Good point.
    >>
    >> Alex
    >>
    >> >
    >> > Christian.
    >> >
    >> > >
    >> > > Reviewed-by: Alex Deucher <alexander.deuc...@amd.com>
    >> > >
    >> > > Alex
    >> > >
    >> > >> Marek
    >> > >>
    >> > >> On Wed, Jan 4, 2023 at 10:33 AM Lazar, Lijo
    <lijo.la...@amd.com> wrote:
    >> > >>> [AMD Official Use Only - General]
    >> > >>>
    >> > >>>
    >> > >>> To clarify, with DPM in place, the current bandwidth will
    be changing based on the load.
    >> > >>>
    >> > >>> If apps/umd already has a way to know the current
    bandwidth utilisation, then possible maximum also could be part of
    the same API. Otherwise, this only looks like duplicate
    information. We have the same information in sysfs DPM nodes.
    >> > >>>
    >> > >>> BTW, I don't know to what extent app/umd really makes use
    of this. Take that memory frequency as an example (I'm reading it
    as 16GHz). It only looks like a spec sheet.
    >> > >>>
    >> > >>> Thanks,
    >> > >>> Lijo
    >> > >>> ________________________________
    >> > >>> From: Marek Olšák <mar...@gmail.com>
    >> > >>> Sent: Wednesday, January 4, 2023 8:40:00 PM
    >> > >>> To: Lazar, Lijo <lijo.la...@amd.com>
    >> > >>> Cc: amd-gfx@lists.freedesktop.org
    <amd-gfx@lists.freedesktop.org>
    >> > >>> Subject: Re: [PATCH 1/2] drm/amdgpu: return the PCIe gen
    and lanes from the INFO
    >> > >>>
    >> > >>> On Wed, Jan 4, 2023 at 9:19 AM Lazar, Lijo
    <lijo.la...@amd.com> wrote:
    >> > >>>
    >> > >>>
    >> > >>>
    >> > >>> On 1/4/2023 7:43 PM, Marek Olšák wrote:
    >> > >>>> On Wed, Jan 4, 2023 at 6:50 AM Lazar, Lijo
    <lijo.la...@amd.com
    >> > >>>> <mailto:lijo.la...@amd.com>> wrote:
    >> > >>>>
    >> > >>>>
    >> > >>>>
    >> > >>>>      On 1/4/2023 4:11 AM, Marek Olšák wrote:
    >> > >>>>       > I see. Well, those sysfs files are not usable,
    and I don't think it
    >> > >>>>       > would be important even if they were usable, but
    for completeness:
    >> > >>>>       >
    >> > >>>>       > The ioctl returns:
    >> > >>>>       >      pcie_gen = 1
    >> > >>>>       > pcie_num_lanes = 16
    >> > >>>>       >
    >> > >>>>       > Theoretical bandwidth from those values: 4.0 GB/s
    >> > >>>>       > My DMA test shows this write bandwidth: 3.5 GB/s
    >> > >>>>       > It matches the expectation.
    >> > >>>>       >
    >> > >>>>       > Let's see the devices (there is only 1 GPU
    Navi21 in the system):
    >> > >>>>       > $ lspci |egrep '(PCI|VGA).*Navi'
    >> > >>>>       > 0a:00.0 PCI bridge: Advanced Micro Devices, Inc.
    [AMD/ATI] Navi
    >> > >>>>      10 XL
    >> > >>>>       > Upstream Port of PCI Express Switch (rev c3)
    >> > >>>>       > 0b:00.0 PCI bridge: Advanced Micro Devices, Inc.
    [AMD/ATI] Navi
    >> > >>>>      10 XL
    >> > >>>>       > Downstream Port of PCI Express Switch
    >> > >>>>       > 0c:00.0 VGA compatible controller: Advanced
    Micro Devices, Inc.
    >> > >>>>       > [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900
    XT] (rev c3)
    >> > >>>>       >
    >> > >>>>       > Let's read sysfs:
    >> > >>>>       >
    >> > >>>>       > $ cat
    /sys/bus/pci/devices/0000:0a:00.0/current_link_width
    >> > >>>>       > 16
    >> > >>>>       > $ cat
    /sys/bus/pci/devices/0000:0b:00.0/current_link_width
    >> > >>>>       > 16
    >> > >>>>       > $ cat
    /sys/bus/pci/devices/0000:0c:00.0/current_link_width
    >> > >>>>       > 16
    >> > >>>>       > $ cat
    /sys/bus/pci/devices/0000:0a:00.0/current_link_speed
    >> > >>>>       > 2.5 GT/s PCIe
    >> > >>>>       > $ cat
    /sys/bus/pci/devices/0000:0b:00.0/current_link_speed
    >> > >>>>       > 16.0 GT/s PCIe
    >> > >>>>       > $ cat
    /sys/bus/pci/devices/0000:0c:00.0/current_link_speed
    >> > >>>>       > 16.0 GT/s PCIe
    >> > >>>>       >
    >> > >>>>       > Problem 1: None of the speed numbers match 4 GB/s.
    >> > >>>>
    >> > >>>>      US bridge = 2.5GT/s means operating at PCIe Gen 1
    speed. Total
    >> > >>>>      theoretical bandwidth is then derived based on
    encoding and total
    >> > >>>>      number
    >> > >>>>      of lanes.
    >> > >>>>
    >> > >>>>       > Problem 2: Userspace doesn't know the bus index
    of the bridges,
    >> > >>>>      and it's
    >> > >>>>       > not clear which bridge should be used.
    >> > >>>>
    >> > >>>>      In general, modern ones have this arch= US->DS->EP.
    US is the one
    >> > >>>>      connected to physical link.
    >> > >>>>
    >> > >>>>       > Problem 3: The PCIe gen number is missing.
    >> > >>>>
    >> > >>>>      Current link speed is based on whether it's
    Gen1/2/3/4/5.
    >> > >>>>
    >> > >>>>      BTW, your patch makes use of capabilities flags
    which gives the maximum
    >> > >>>>      supported speed/width by the device. It may not
    necessarily reflect the
    >> > >>>>      current speed/width negotiated. I guess in NV, this
    info is already
    >> > >>>>      obtained from PMFW and made available through
    metrics table.
    >> > >>>>
    >> > >>>>
    >> > >>>> It computes the minimum of the device PCIe gen and the
    motherboard/slot
    >> > >>>> PCIe gen to get the final value. These 2 lines do that.
    The low 16 bits
    >> > >>>> of the mask contain the device PCIe gen mask. The high
    16 bits of the
    >> > >>>> mask contain the slot PCIe gen mask.
    >> > >>>> + pcie_gen_mask = adev->pm.pcie_gen_mask &
    (adev->pm.pcie_gen_mask >> 16);
    >> > >>>> + dev_info->pcie_gen = fls(pcie_gen_mask);
    >> > >>>>
    >> > >>> With DPM in place on some ASICs, how much does this
    static info help for
    >> > >>> upper level apps?
    >> > >>>
    >> > >>>
    >> > >>> It helps UMDs make better decisions if they know the
    maximum achievable bandwidth. UMDs also compute the maximum memory
    bandwidth and compute performance (FLOPS). Right now it's printed
    by Mesa to give users detailed information about their GPU. For
    example:
    >> > >>>
    >> > >>> $ AMD_DEBUG=info glxgears
    >> > >>> Device info:
    >> > >>>      name = NAVI21
    >> > >>>      marketing_name = AMD Radeon RX 6800
    >> > >>>      num_se = 3
    >> > >>>      num_rb = 12
    >> > >>>      num_cu = 60
    >> > >>>      max_gpu_freq = 2475 MHz
    >> > >>>      max_gflops = 19008 GFLOPS
    >> > >>>      l0_cache_size = 16 KB
    >> > >>>      l1_cache_size = 128 KB
    >> > >>>      l2_cache_size = 4096 KB
    >> > >>>      l3_cache_size = 128 MB
    >> > >>>      memory_channels = 16 (TCC blocks)
    >> > >>>      memory_size = 16 GB (16384 MB)
    >> > >>>      memory_freq = 16 GHz
    >> > >>>      memory_bus_width = 256 bits
    >> > >>>      memory_bandwidth = 512 GB/s
    >> > >>>      pcie_gen = 1
    >> > >>>      pcie_num_lanes = 16
    >> > >>>      pcie_bandwidth = 4.0 GB/s
    >> > >>>
    >> > >>> Marek
    >> >

Reply via email to