[PATCH V2] drm/amdgpu: Fix ras mode2 reset failure in ras aca mode

2024-04-23 Thread YiPeng Chai
Fix ras mode2 reset failure in ras aca mode. Signed-off-by: YiPeng Chai --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index edb3cd0cef96..11a70991152c 100644

[PATCH] drm/amdkfd: Check debug trap enable before write dbg_ev_file

2024-04-23 Thread Lin . Cao
In interrupt context, write dbg_ev_file will be run by work queue. It will cause write dbg_ev_file execution after debug_trap_disable, which will cause NULL pointer access. Signed-off-by: Lin.Cao --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

Re: [PATCH 2/4] Initialize the last_jump_jiffies in atom_exec_context before it used

2024-04-23 Thread Alex Deucher
On Tue, Apr 23, 2024 at 11:07 PM wrote: > > From: Jesse Zhang > > The parameter "last_jump_jiffies" should be initialized before being used in > the function atom_op_jump. > > Signed-off-by: Jesse Zhang Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/atom.c | 1 + > 1 file

Re: [PATCH 1/4] drm/amdgpu: add check before free wb entry

2024-04-23 Thread Alex Deucher
On Tue, Apr 23, 2024 at 11:27 PM wrote: > > From: Jesse Zhang > > check if ring is not mes queue before free wb entry. Minor clarification to the commit text: Check if ring is not a mes queue before freeing the wb entry because we only allocate a wb entry when it's not a mes queue. With that

[PATCH 4/4] drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc

2024-04-23 Thread jesse.zhang
From: Jesse Zhang Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x0301. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c

[PATCH] drm/amdgpu: skip to create ras xxx_err_count node when ACA is enabled

2024-04-23 Thread Yang Wang
skip to create 'xxx_err_count' node when ACA is enabled. Signed-off-by: Yang Wang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index

[PATCH 3/4] drm/amdgpu: Using uninitialized value new_state.jpeg when calling adev->vcn.pause_dpg_mode

2024-04-23 Thread jesse.zhang
From: Jesse Zhang Initialize the new_state.jpeg before it used Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index

[PATCH 2/4] Initialize the last_jump_jiffies in atom_exec_context before it used

2024-04-23 Thread jesse.zhang
From: Jesse Zhang The parameter "last_jump_jiffies" should be initialized before being used in the function atom_op_jump. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/atom.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/atom.c

[PATCH 1/4] drm/amdgpu: add check before free wb entry

2024-04-23 Thread jesse.zhang
From: Jesse Zhang check if ring is not mes queue before free wb entry. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 ++- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 3 ++- drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 3 ++- 3 files changed, 6 insertions(+), 3

Re: [PATCH] drm/amdgpu: Fixup bad vram size on gmc v6 and v7

2024-04-23 Thread Alex Deucher
On Tue, Apr 23, 2024 at 10:30 PM Qiang Ma wrote: > > On Mon, 22 Apr 2024 16:47:36 +0200 > Christian König wrote: > > > Am 22.04.24 um 16:40 schrieb Alex Deucher: > > > On Mon, Apr 22, 2024 at 9:00 AM Christian König > > > wrote: > > >> Am 22.04.24 um 14:33 schrieb Qiang Ma: > > >>> On Mon, 22

Re: [PATCH] drm/amdgpu: fix some uninitialized variables

2024-04-23 Thread Alex Deucher
Fix Leo's address. On Tue, Apr 23, 2024 at 10:33 PM Alex Deucher wrote: > > On Tue, Apr 23, 2024 at 10:04 PM Zhang, Jesse(Jie) > wrote: > > > > [AMD Official Use Only - General] > > > > Hi Alex, > > > > -Original Message- > > From: Alex Deucher > > Sent: Wednesday, April 24, 2024 9:46

Re: [PATCH] drm/amdgpu: fix some uninitialized variables

2024-04-23 Thread Alex Deucher
On Tue, Apr 23, 2024 at 10:04 PM Zhang, Jesse(Jie) wrote: > > [AMD Official Use Only - General] > > Hi Alex, > > -Original Message- > From: Alex Deucher > Sent: Wednesday, April 24, 2024 9:46 AM > To: Zhang, Jesse(Jie) > Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander > ; Koenig,

RE: [PATCH] drm/amdgpu: fix some uninitialized variables

2024-04-23 Thread Zhang, Jesse(Jie)
[AMD Official Use Only - General] Hi Alex, -Original Message- From: Alex Deucher Sent: Wednesday, April 24, 2024 9:46 AM To: Zhang, Jesse(Jie) Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Koenig, Christian ; Huang, Tim Subject: Re: [PATCH] drm/amdgpu: fix some

Re: [PATCH] drm/amdgpu: fix some uninitialized variables

2024-04-23 Thread Alex Deucher
On Tue, Apr 23, 2024 at 9:27 PM Jesse Zhang wrote: > > Fix some variables not initialized before use. > Scan them out using Synopsys tools. > > Signed-off-by: Jesse Zhang > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 5 + >

[PATCH v2] drm/amd/display: Remove redundant NULL check in dcn10_set_input_transfer_func

2024-04-23 Thread Srinivasan Shanmugam
This commit removes an unnecessary NULL check in the `dcn10_set_input_transfer_func` function in the `dcn10_hwseq.c` file. The variable `tf` is assigned the address of `plane_state->in_transfer_func` unconditionally, so it can never be `NULL`. Therefore, the check `if (tf == NULL)` is unnecessary

[PATCH] drm/amd/display: Remove redundant NULL check in dce110_set_input_transfer_func

2024-04-23 Thread Srinivasan Shanmugam
This commit removes a redundant NULL check in the `dce110_set_input_transfer_func` function in the `dce110_hwseq.c` file. The variable `tf` is assigned the address of `plane_state->in_transfer_func` unconditionally, so it can never be `NULL`. Therefore, the check `if (tf == NULL)` is unnecessary

[PATCH] drm/amd/display: Remove redundant NULL check in dcn20_set_input_transfer_func

2024-04-23 Thread Srinivasan Shanmugam
This commit removes an unnecessary NULL check in the `dcn10_set_input_transfer_func` function in the `dcn10_hwseq.c` file. The variable `tf` is assigned the address of `plane_state->in_transfer_func` unconditionally, so it can never be `NULL`. Therefore, the check `if (tf == NULL)` is unnecessary

[PATCH] drm/amdgpu: fix some uninitialized variables

2024-04-23 Thread Jesse Zhang
Fix some variables not initialized before use. Scan them out using Synopsys tools. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 5 + drivers/gpu/drm/amd/amdgpu/atom.c | 1 +

Re: [PATCH v5 1/6] drm/amdgpu: Support contiguous VRAM allocation

2024-04-23 Thread Felix Kuehling
On 2024-04-23 11:28, Philip Yang wrote: RDMA device with limited scatter-gather ability requires contiguous VRAM buffer allocation for RDMA peer direct support. Add a new KFD alloc memory flag and store as bo alloc flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA

Re: [PATCH v5 3/6] drm/amdgpu: Evict BOs from same process for contiguous allocation

2024-04-23 Thread Felix Kuehling
On 2024-04-23 11:28, Philip Yang wrote: When TTM failed to alloc VRAM, TTM try evict BOs from VRAM to system memory then retry the allocation, this skips the KFD BOs from the same process because KFD require all BOs are resident for user queues. If TTM with TTM_PL_FLAG_CONTIGUOUS flag to alloc

Re: [PATCH v5 4/6] drm/amdkfd: Evict BO itself for contiguous allocation

2024-04-23 Thread Felix Kuehling
On 2024-04-23 11:28, Philip Yang wrote: If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM space, and then move it from system memory back to VRAM. Signed-off-by: Philip Yang ---

Re: [PATCH v5 5/6] drm/amdkfd: Increase KFD bo restore wait time

2024-04-23 Thread Felix Kuehling
On 2024-04-23 11:28, Philip Yang wrote: TTM allocate contiguous VRAM may takes more than 1 second to evict BOs for larger size RDMA buffer. Because KFD restore bo worker reserves all KFD BOs, then TTM cannot hold the remainning KFD BOs lock to evict them, this causes TTM failed to alloc

Re: [PATCH 1/2] drm/amd/display: clean inconsistent indenting

2024-04-23 Thread Rodrigo Siqueira Jordao
On 2/13/24 3:43 PM, Joao Paulo Pereira da Silva wrote: From: jppaulo Clean some wrong indenting that throw errors in checkpatch. Signed-off-by: Joao Paulo Pereira da Silva --- drivers/gpu/drm/amd/display/dc/core/dc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff

Re: [PATCH] drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms

2024-04-23 Thread Felix Kuehling
On 2024-04-22 05:10, Lang Yu wrote: Observed on gfx8 ASIC when KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used. Two attachments use the same VM, root PD would be locked twice. [ 57.910418] Call Trace: [ 57.793726] ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu] [ 57.793820]

Re: [PATCH] drm/amd/display: Remove duplicated function signature from dcn3.01 DCCG

2024-04-23 Thread Rodrigo Siqueira Jordao
On 2/22/24 7:19 AM, David Tadokoro wrote: In the header file dc/dcn301/dcn301_dccg.h, the function dccg301_create is declared twice, so remove duplication. Signed-off-by: David Tadokoro --- drivers/gpu/drm/amd/display/dc/dcn301/dcn301_dccg.h | 6 -- 1 file changed, 6 deletions(-)

Re: [PATCH] drm/amdgpu: Fix VRAM memory accounting

2024-04-23 Thread Felix Kuehling
On 2024-04-23 14:56, Mukul Joshi wrote: Subtract the VRAM pinned memory when checking for available memory in amdgpu_amdkfd_reserve_mem_limit function since that memory is not available for use. Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling ---

Re: [PATCH] drm/amd/display: use mpcc_count to log MPC state

2024-04-23 Thread Rodrigo Siqueira Jordao
On 4/12/24 10:39 AM, Melissa Wen wrote: According to [1]: ``` DTN only logs 'pipe_count' instances of MPCC. However in some cases there are different number of MPCC than DPP (pipe_count). ``` As DTN log still relies on pipe_count to print mpcc state, switch to mpcc_count in all occurrences.

[linux-next:master] BUILD REGRESSION a59668a9397e7245b26e9be85d23f242ff757ae8

2024-04-23 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: a59668a9397e7245b26e9be85d23f242ff757ae8 Add linux-next specific files for 20240423 Error/Warning reports: https://lore.kernel.org/oe-kbuild-all/202404231839.ohiy9lw8-...@intel.com Error

RE: [PATCH] drm/amdgpu: update fw_share for VCN5

2024-04-23 Thread Dong, Ruijing
[AMD Official Use Only - General] Reviewed-by: Ruijing Dong Thanks, Ruijing -Original Message- From: amd-gfx On Behalf Of Sonny Jiang Sent: Tuesday, April 23, 2024 2:41 PM To: amd-gfx@lists.freedesktop.org Cc: Jiang, Sonny Subject: [PATCH] drm/amdgpu: update fw_share for VCN5

[PATCH] drm/amdgpu: Fix VRAM memory accounting

2024-04-23 Thread Mukul Joshi
Subtract the VRAM pinned memory when checking for available memory in amdgpu_amdkfd_reserve_mem_limit function since that memory is not available for use. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff

[PATCH] drm/amdgpu: update fw_share for VCN5

2024-04-23 Thread Sonny Jiang
kmd_fw_shared changed in VCN5 Signed-off-by: Sonny Jiang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 5 - drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 10 ++ drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c | 14 +++--- 3 files changed, 21 insertions(+), 8 deletions(-) diff --git

Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row

2024-04-23 Thread Felix Kuehling
On 2024-04-23 01:50, Christian König wrote: Am 22.04.24 um 21:45 schrieb Yunxiang Li: Reset request from KFD is missing a check for if a reset is already in progress, this causes a second reset to be triggered right after the previous one finishes. Add the check to align with the other reset

Re: [PATCH RESEND] drm/amd/display: Fix division by zero in setup_dsc_config

2024-04-23 Thread Rodrigo Siqueira Jordao
On 4/22/24 8:35 AM, Jose Fernandez wrote: When slice_height is 0, the division by slice_height in the calculation of the number of slices will cause a division by zero driver crash. This leaves the kernel in a state that requires a reboot. This patch adds a check to avoid the division by

Re: [PATCH v2] drm/amdgpu: Fix two reset triggered in a row

2024-04-23 Thread Christian König
Am 23.04.24 um 16:44 schrieb Yunxiang Li: Some times a hang GPU causes multiple reset source to schedule resets, if the second source schedule after we call amdgpu_device_stop_pending_resets they will be able to trigger an unnecessary reset. Move amdgpu_device_stop_pending_resets to after the

Re: [PATCH v3] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-23 Thread Christian König
Am 23.04.24 um 16:31 schrieb Tim Huang: From: Tim Huang Clear warning that uses uninitialized value fw_size. Signed-off-by: Tim Huang Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git

Re: [PATCH] drm/amd/display: Remove unnecessary NULL check in dcn20_set_input_transfer_func

2024-04-23 Thread Harry Wentland
On 2024-04-23 09:59, Srinivasan Shanmugam wrote: > This commit removes an unnecessary NULL check in the > `dcn20_set_input_transfer_func` function in the `dcn20_hwseq.c` file. > The variable `tf` is assigned the address of > `plane_state->in_transfer_func` unconditionally, so it can never be >

[PATCH v5 6/6] drm/amdkfd: Bump kfd version for contiguous VRAM allocation

2024-04-23 Thread Philip Yang
Bump the kfd ioctl minor version to delcare the contiguous VRAM allocation flag support. Signed-off-by: Philip Yang --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index

[PATCH v5 4/6] drm/amdkfd: Evict BO itself for contiguous allocation

2024-04-23 Thread Philip Yang
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM space, and then move it from system memory back to VRAM. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 16

[PATCH v5 2/6] drm/amdgpu: Handle sg size limit for contiguous allocation

2024-04-23 Thread Philip Yang
Define macro MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length is unsigned int, and some users of it cast to a signed int, so every segment of sg table is limited to size 2GB maximum. For contiguous VRAM allocation, don't limit the max buddy block size in order to get contiguous VRAM

[PATCH v5 3/6] drm/amdgpu: Evict BOs from same process for contiguous allocation

2024-04-23 Thread Philip Yang
When TTM failed to alloc VRAM, TTM try evict BOs from VRAM to system memory then retry the allocation, this skips the KFD BOs from the same process because KFD require all BOs are resident for user queues. If TTM with TTM_PL_FLAG_CONTIGUOUS flag to alloc contiguous VRAM, allow TTM evict KFD BOs

[PATCH v5 5/6] drm/amdkfd: Increase KFD bo restore wait time

2024-04-23 Thread Philip Yang
TTM allocate contiguous VRAM may takes more than 1 second to evict BOs for larger size RDMA buffer. Because KFD restore bo worker reserves all KFD BOs, then TTM cannot hold the remainning KFD BOs lock to evict them, this causes TTM failed to alloc contiguous VRAM. Increase the KFD restore BO wait

[PATCH v5 1/6] drm/amdgpu: Support contiguous VRAM allocation

2024-04-23 Thread Philip Yang
RDMA device with limited scatter-gather ability requires contiguous VRAM buffer allocation for RDMA peer direct support. Add a new KFD alloc memory flag and store as bo alloc flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA peerdirect access, this will set

[PATCH v5 0/6] Best effort contiguous VRAM allocation

2024-04-23 Thread Philip Yang
This patch series implement new KFD memory alloc flag for best effort contiguous VRAM allocation, to support peer direct access RDMA device with limited scatter-gather dma capability. v2: rebase on patch ("drm/amdgpu: Modify the contiguous flags behaviour") to avoid adding the new GEM flag

[PATCH v2] drm/amdgpu: Fix two reset triggered in a row

2024-04-23 Thread Yunxiang Li
Some times a hang GPU causes multiple reset source to schedule resets, if the second source schedule after we call amdgpu_device_stop_pending_resets they will be able to trigger an unnecessary reset. Move amdgpu_device_stop_pending_resets to after the reset is already done, since any reset

[PATCH v3] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-23 Thread Tim Huang
From: Tim Huang Clear warning that uses uninitialized value fw_size. Signed-off-by: Tim Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c

RE: [PATCH v2] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-23 Thread Huang, Tim
[AMD Official Use Only - General] -Original Message- From: Koenig, Christian Sent: Tuesday, April 23, 2024 7:30 PM To: Huang, Tim ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: Re: [PATCH v2] drm/amdgpu: fix uninitialized scalar variable warning Am 23.04.24 um 10:43

Re: [PATCH 1/2] drm/print: drop include debugfs.h and include where needed

2024-04-23 Thread Matt Coster
On 22/04/2024 13:10, Jani Nikula wrote: > Surprisingly many places depend on debugfs.h to be included via > drm_print.h. Fix them. > > v3: Also fix armada, ite-it6505, imagination, msm, sti, vc4, and xe > > v2: Also fix ivpu and vmwgfx > > Reviewed-by: Andrzej Hajda > Acked-by: Maxime Ripard

Re: [PATCH 1/2] drm/amdgpu: add a spinlock to wb allocation

2024-04-23 Thread Alex Deucher
On Tue, Apr 23, 2024 at 9:58 AM Christian König wrote: > > Am 23.04.24 um 15:18 schrieb Alex Deucher: > > On Tue, Apr 23, 2024 at 2:57 AM Christian König > > wrote: > >> Am 22.04.24 um 16:37 schrieb Alex Deucher: > >>> As we use wb slots more dynamically, we need to lock > >>> access to avoid

[PATCH] drm/amd/display: Remove unnecessary NULL check in dcn20_set_input_transfer_func

2024-04-23 Thread Srinivasan Shanmugam
This commit removes an unnecessary NULL check in the `dcn20_set_input_transfer_func` function in the `dcn20_hwseq.c` file. The variable `tf` is assigned the address of `plane_state->in_transfer_func` unconditionally, so it can never be `NULL`. Therefore, the check `if (tf == NULL)` is unnecessary

Re: [PATCH 1/2] drm/amdgpu: add a spinlock to wb allocation

2024-04-23 Thread Christian König
Am 23.04.24 um 15:18 schrieb Alex Deucher: On Tue, Apr 23, 2024 at 2:57 AM Christian König wrote: Am 22.04.24 um 16:37 schrieb Alex Deucher: As we use wb slots more dynamically, we need to lock access to avoid racing on allocation or free. Wait a second. Why are we using the wb slots

Re: [PATCH v4 6/7] drm/amdgpu: Skip dma map resource for null RDMA device

2024-04-23 Thread Philip Yang
On 2024-04-23 09:32, Christian König wrote: Am 23.04.24 um 15:04 schrieb Philip Yang: To test RDMA using dummy driver on the system without NIC/RDMA device, the get/put dma pages pass in null device pointer, skip

Re: [PATCH v4 6/7] drm/amdgpu: Skip dma map resource for null RDMA device

2024-04-23 Thread Christian König
Am 23.04.24 um 15:04 schrieb Philip Yang: To test RDMA using dummy driver on the system without NIC/RDMA device, the get/put dma pages pass in null device pointer, skip the dma map/unmap resource and sg table to avoid null pointer access. Well just to make it clear this patch is really a no-go

Re: [PATCH v4 2/7] drm/amdgpu: Handle sg size limit for contiguous allocation

2024-04-23 Thread Christian König
Am 23.04.24 um 15:04 schrieb Philip Yang: Define macro MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length is unsigned int, and some users of it cast to a signed int, so every segment of sg table is limited to size 2GB maximum. For contiguous VRAM allocation, don't limit the max buddy

Re: [PATCH 1/2] drm/amdgpu: add a spinlock to wb allocation

2024-04-23 Thread Alex Deucher
On Tue, Apr 23, 2024 at 2:57 AM Christian König wrote: > > Am 22.04.24 um 16:37 schrieb Alex Deucher: > > As we use wb slots more dynamically, we need to lock > > access to avoid racing on allocation or free. > > Wait a second. Why are we using the wb slots dynamically? > See patch 2. I needed

[PATCH v4 1/7] drm/amdgpu: Support contiguous VRAM allocation

2024-04-23 Thread Philip Yang
RDMA device with limited scatter-gather ability requires contiguous VRAM buffer allocation for RDMA peer direct support. Add a new KFD alloc memory flag and store as bo alloc flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA peerdirect access, this will set

[PATCH v4 3/7] drm/amdgpu: Evict BOs from same process for contiguous allocation

2024-04-23 Thread Philip Yang
When TTM failed to alloc VRAM, TTM try evict BOs from VRAM to system memory then retry the allocation, this skips the KFD BOs from the same process because KFD require all BOs are resident for user queues. If TTM with TTM_PL_FLAG_CONTIGUOUS flag to alloc contiguous VRAM, allow TTM evict KFD BOs

[PATCH v4 7/7] drm/amdkfd: Bump kfd version for contiguous VRAM allocation

2024-04-23 Thread Philip Yang
Bump the kfd ioctl minor version to delcare the contiguous VRAM allocation flag support. Signed-off-by: Philip Yang --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index

[PATCH v4 4/7] drm/amdkfd: Evict BO itself for contiguous allocation

2024-04-23 Thread Philip Yang
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM space, and then move it from system memory back to VRAM. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 16

[PATCH v4 6/7] drm/amdgpu: Skip dma map resource for null RDMA device

2024-04-23 Thread Philip Yang
To test RDMA using dummy driver on the system without NIC/RDMA device, the get/put dma pages pass in null device pointer, skip the dma map/unmap resource and sg table to avoid null pointer access. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 33

[PATCH v4 5/7] drm/amdkfd: Increase KFD bo restore wait time

2024-04-23 Thread Philip Yang
TTM allocate contiguous VRAM may takes more than 1 second to evict BOs for larger size RDMA buffer. Because KFD restore bo worker reserves all KFD BOs, then TTM cannot hold the remainning KFD BOs lock to evict them, this causes TTM failed to alloc contiguous VRAM. Increase the KFD restore BO wait

[PATCH v4 2/7] drm/amdgpu: Handle sg size limit for contiguous allocation

2024-04-23 Thread Philip Yang
Define macro MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length is unsigned int, and some users of it cast to a signed int, so every segment of sg table is limited to size 2GB maximum. For contiguous VRAM allocation, don't limit the max buddy block size in order to get contiguous VRAM

[PATCH v4 0/7] Best effort contiguous VRAM allocation

2024-04-23 Thread Philip Yang
This patch series implement new KFD memory alloc flag for best effort contiguous VRAM allocation, to support peer direct access RDMA device with limited scatter-gather dma capability. v2: rebase on patch ("drm/amdgpu: Modify the contiguous flags behaviour") to avoid adding the new GEM flag

Re: [PATCH v1 3/4] drm/ci: uprev IGT and generate testlist from build

2024-04-23 Thread Vignesh Raman
Hi, On 23/04/24 17:53, Dmitry Baryshkov wrote: On Tue, 23 Apr 2024 at 13:24, Maíra Canal wrote: On 4/23/24 01:02, Vignesh Raman wrote: Uprev IGT to the latest version and stop vendoring the testlist into the kernel. Instead, use the testlist from the IGT build to ensure we do not miss

Re: [PATCH v1 3/4] drm/ci: uprev IGT and generate testlist from build

2024-04-23 Thread Dmitry Baryshkov
On Tue, 23 Apr 2024 at 13:24, Maíra Canal wrote: > > On 4/23/24 01:02, Vignesh Raman wrote: > > Uprev IGT to the latest version and stop vendoring the > > testlist into the kernel. Instead, use the testlist from > > the IGT build to ensure we do not miss renamed or newly > > added tests. > >

[PATCH AUTOSEL 6.6 14/16] drm/radeon: silence UBSAN warning (v3)

2024-04-23 Thread Sasha Levin
From: Alex Deucher [ Upstream commit 781d41fed19caf900c8405064676813dc9921d32 ] Convert a variable sized array from [1] to []. v2: fix up a few more. v3: integrate comments from Kees. Reviewed-by: Kees Cook Tested-by: Jeff Johnson (v2) Acked-by: Christian König (v1) Signed-off-by: Alex

[PATCH AUTOSEL 6.8 16/18] drm/radeon: silence UBSAN warning (v3)

2024-04-23 Thread Sasha Levin
From: Alex Deucher [ Upstream commit 781d41fed19caf900c8405064676813dc9921d32 ] Convert a variable sized array from [1] to []. v2: fix up a few more. v3: integrate comments from Kees. Reviewed-by: Kees Cook Tested-by: Jeff Johnson (v2) Acked-by: Christian König (v1) Signed-off-by: Alex

Re: [PATCH] drm/amdgpu: add error handle to avoid out-of-bounds

2024-04-23 Thread Christian König
Am 23.04.24 um 11:15 schrieb Bob Zhou: if the sdma_v4_0_irq_id_to_seq return -EINVAL, the process should be stop to avoid out-of-bounds read, so directly return -EINVAL. Signed-off-by: Bob Zhou Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 3 +++ 1 file

Re: [PATCH v2] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-23 Thread Christian König
Am 23.04.24 um 10:43 schrieb Tim Huang: From: Tim Huang Clear warning that uses uninitialized value fw_size. Signed-off-by: Tim Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git

[PATCH] Revert "drm/amdkfd: Add partition id field to location_id"

2024-04-23 Thread Lijo Lazar
This reverts commit 925c7bd1d1cf9f173b22603c8bd4816d142d4935. RCCL library is currently not treating spatial partitions differently, hence this change is causing issues. Revert temporarily till RCCL implementation is ready for spatial partitions. Signed-off-by: Lijo Lazar ---

Re: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-23 Thread Christian König
Am 23.04.24 um 10:12 schrieb Huang, Tim: [AMD Official Use Only - General] -Original Message- From: amd-gfx On Behalf Of Huang, Tim Sent: Tuesday, April 23, 2024 4:01 PM To: Koenig, Christian ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: RE: [PATCH] drm/amdgpu: fix

Re: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-23 Thread Christian König
The problem is that it's a hit all case and that's usually seen as bad coding style. In other words when one branch by accident forgets to set the fw_size we wouldn't get a warning any more and just use zero. Please rather add setting the fw_size to zero to the default branch and maybe even

[PATCH] drm/amdgpu: fix MES GFX mask

2024-04-23 Thread Shashank Sharma
Current MES GFX mask prevents FW to enable oversubscription. This patch does the following: - Fixes the mask values and adds a description for the same - Removes the central mask setup and makes it IP specific, as it would be different when the number of pipes and queues are different. Cc:

[PATCH] drm/amdgpu: add error handle to avoid out-of-bounds

2024-04-23 Thread Bob Zhou
if the sdma_v4_0_irq_id_to_seq return -EINVAL, the process should be stop to avoid out-of-bounds read, so directly return -EINVAL. Signed-off-by: Bob Zhou --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c

[PATCH v2] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-23 Thread Tim Huang
From: Tim Huang Clear warning that uses uninitialized value fw_size. Signed-off-by: Tim Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c

RE: [PATCH 3/3] drm/amdgpu: add the amdgpu buffer object move speed metrics

2024-04-23 Thread Liang, Prike
[Public] Hi, Christian The basic idea is to collect the following performance data and export this raw data into a centralized debugfs. This raw data may help in performance tuning from the AMDGPU kernel driver side. Additionally, this performance data should be easily used for tool libraries

RE: [PATCH 2/2] drm/amdgpu: fix uninitialized variable warning

2024-04-23 Thread Zhou, Bob
[AMD Official Use Only - General] Hi Christian Agree with you, returning an error is surely a better modification. I will send v2 patch to fix this. Regards, Bob -Original Message- From: Koenig, Christian Sent: 2024年4月23日 15:41 To: Zhou, Bob ; Koenig, Christian ;

[PATCH 4/4] drm/amdgpu: avoid dump mca bank log muti times during ras ISR

2024-04-23 Thread Yang Wang
because the ue valid mca count will only be cleared after gpu reset, so only dump mca log on the first time to get mca bank after receive RAS interrupt. Signed-off-by: Yang Wang --- drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c | 28 + drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h

[PATCH 3/4] drm/amdgpu: add MCA smu cache support

2024-04-23 Thread Yang Wang
v1: because SMU CE valid mca bank will be cleared after reading, this patch adds mca cache at the driver level to ensure that the mca bank is not lost. v2: refine amdgpu_mca_init/fini/reset() function name. v3: add mca_cache.lock support only add CE bank to mca bank cache. Signed-off-by: Yang

[PATCH 2/4] drm/amdgpu: add amdgpu MCA bank dispatch function support

2024-04-23 Thread Yang Wang
- Refine mca driver code. - Centralize mca bank dispatch code logic. Signed-off-by: Yang Wang Reviewed-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c | 97 ++--- 1 file changed, 55 insertions(+), 42 deletions(-) diff --git

[PATCH 1/4] drm/amdgpu: remove unused MCA driver codes

2024-04-23 Thread Yang Wang
- remove unused callback functions. - make part of mca functions static and refine the function order. Signed-off-by: Yang Wang Reviewed-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c | 199 -- drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h | 16 --

RE: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-23 Thread Huang, Tim
[AMD Official Use Only - General] -Original Message- From: amd-gfx On Behalf Of Huang, Tim Sent: Tuesday, April 23, 2024 4:01 PM To: Koenig, Christian ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: RE: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning [AMD

RE: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-23 Thread Huang, Tim
[AMD Official Use Only - General] Hi Christian, -Original Message- From: Koenig, Christian Sent: Tuesday, April 23, 2024 3:43 PM To: Huang, Tim ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Koenig, Christian Subject: Re: [PATCH] drm/amdgpu: fix uninitialized scalar variable

Re: [PATCH] drm/amd/display: Fix division by zero in setup_dsc_config

2024-04-23 Thread Markus Elfring
… > +++ b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c > @@ -1055,7 +1055,12 @@ static bool setup_dsc_config( > if (!is_dsc_possible) > goto done; > > - dsc_cfg->num_slices_v = pic_height/slice_height; > + if (slice_height > 0) > + dsc_cfg->num_slices_v =

[PATCH RESEND] drm/amd/display: Fix division by zero in setup_dsc_config

2024-04-23 Thread Jose Fernandez
When slice_height is 0, the division by slice_height in the calculation of the number of slices will cause a division by zero driver crash. This leaves the kernel in a state that requires a reboot. This patch adds a check to avoid the division by zero. The stack trace below is for the 6.8.4

Re: [PATCH] drm/amdgpu: Fixup bad vram size on gmc v6 and v7

2024-04-23 Thread Qiang Ma
On Mon, 22 Apr 2024 14:59:36 +0200 Christian König wrote: > Am 22.04.24 um 14:33 schrieb Qiang Ma: > > On Mon, 22 Apr 2024 11:40:26 +0200 > > Christian König wrote: > > > >> Am 22.04.24 um 07:26 schrieb Qiang Ma: > >>> Some boards(like Oland PRO: 0x1002:0x6613) seem to have > >>> garbage in

Re: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-23 Thread Christian König
Am 23.04.24 um 08:28 schrieb Tim Huang: Clear warning that uses uninitialized value fw_size. In which case is the fw_size uninitialized and why setting it to zero helps in that case? Regards, Christian. Signed-off-by: Tim Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- 1

Re: [PATCH 2/2] drm/amdgpu: fix uninitialized variable warning

2024-04-23 Thread Christian König
In this case we should modify amdgpu_i2c_get_byte() to return an error and prevent writing the value back. See zero is as random as any other value and initializing the variable here doesn't really help, it just makes your warning disappear. Regards, Christian. Am 23.04.24 um 08:27 schrieb

RE: [PATCH 2/2] drm/amdgpu: fix uninitialized variable warning

2024-04-23 Thread Zhou, Bob
[Public] Thanks for your comments. I should clarify the issue. As you see the amdgpu_i2c_get_byte code: if (i2c_transfer(_bus->adapter, msgs, 2) == 2) { *val = in_buf[0]; DRM_DEBUG("val = 0x%02x\n", *val); } else {

[PATCH] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-23 Thread Tim Huang
Clear warning that uses uninitialized value fw_size. Signed-off-by: Tim Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index

RE: [PATCH 2/2] drm/amdgpu: fix uninitialized variable warning

2024-04-23 Thread Zhou, Bob
[AMD Official Use Only - General] Thanks for your comments. I should clarify the issue. As you see the amdgpu_i2c_get_byte code: if (i2c_transfer(_bus->adapter, msgs, 2) == 2) { *val = in_buf[0]; DRM_DEBUG("val = 0x%02x\n", *val);

Re: [PATCH v2] drm/amdgpu: IB test encode test package change for VCN5

2024-04-23 Thread Christian König
Am 22.04.24 um 21:59 schrieb Sonny Jiang: From: Sonny Jiang VCN5 session info package interface changed Signed-off-by: Sonny Jiang Mhm, in general we should push back on FW changes which makes stuff like that necessary. So what is the justification? If the FW has a good justification

Re: [PATCH 1/2] drm/amdgpu: add a spinlock to wb allocation

2024-04-23 Thread Christian König
Am 22.04.24 um 16:37 schrieb Alex Deucher: As we use wb slots more dynamically, we need to lock access to avoid racing on allocation or free. Wait a second. Why are we using the wb slots dynamically? The number of slots made available is statically calculated, when this is suddenly used

Re: [PATCH 3/3] drm/amdgpu: Fix Uninitialized scalar variable warning

2024-04-23 Thread Christian König
Am 23.04.24 um 04:53 schrieb Ma, Jun: unsigned int client_id, src_id; struct amdgpu_irq_src *src; bool handled = false; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 924baf58e322..f0a63d084b4d 100644 ---

RE: [PATCH] drm/amdgpu: Fix ras mode2 reset failure in ras aca mode

2024-04-23 Thread Chai, Thomas
[AMD Official Use Only - General] OK - Best Regards, Thomas -Original Message- From: Zhang, Hawking Sent: Tuesday, April 23, 2024 11:27 AM To: Chai, Thomas ; amd-gfx@lists.freedesktop.org Cc: Zhou1, Tao ; Li, Candice ; Wang, Yang(Kevin) ; Yang, Stanley Subject: RE:

Re: [PATCH 2/2] drm/amdgpu: fix uninitialized variable warning

2024-04-23 Thread Christian König
Am 23.04.24 um 07:33 schrieb Bob Zhou: Because the val isn't initialized, a random variable is set by amdgpu_i2c_put_byte. So fix the uninitialized issue. Well that isn't correct. See the code here:     amdgpu_i2c_get_byte(amdgpu_connector->router_bus,