Fix ras mode2 reset failure in ras aca mode.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index edb3cd0cef96..11a70991152c 100644
In interrupt context, write dbg_ev_file will be run by work queue. It
will cause write dbg_ev_file execution after debug_trap_disable, which
will cause NULL pointer access.
Signed-off-by: Lin.Cao
---
drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
On Tue, Apr 23, 2024 at 11:07 PM wrote:
>
> From: Jesse Zhang
>
> The parameter "last_jump_jiffies" should be initialized before being used in
> the function atom_op_jump.
>
> Signed-off-by: Jesse Zhang
Reviewed-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/atom.c | 1 +
> 1 file
On Tue, Apr 23, 2024 at 11:27 PM wrote:
>
> From: Jesse Zhang
>
> check if ring is not mes queue before free wb entry.
Minor clarification to the commit text:
Check if ring is not a mes queue before freeing the wb entry because we only
allocate a wb entry when it's not a mes queue.
With that
From: Jesse Zhang
Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x0301.
Signed-off-by: Jesse Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
skip to create 'xxx_err_count' node when ACA is enabled.
Signed-off-by: Yang Wang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 ++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
From: Jesse Zhang
Initialize the new_state.jpeg before it used
Signed-off-by: Jesse Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 5 +
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index
From: Jesse Zhang
The parameter "last_jump_jiffies" should be initialized before being used in
the function atom_op_jump.
Signed-off-by: Jesse Zhang
---
drivers/gpu/drm/amd/amdgpu/atom.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/atom.c
From: Jesse Zhang
check if ring is not mes queue before free wb entry.
Signed-off-by: Jesse Zhang
---
drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 ++-
drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 3 ++-
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 3 ++-
3 files changed, 6 insertions(+), 3
On Tue, Apr 23, 2024 at 10:30 PM Qiang Ma wrote:
>
> On Mon, 22 Apr 2024 16:47:36 +0200
> Christian König wrote:
>
> > Am 22.04.24 um 16:40 schrieb Alex Deucher:
> > > On Mon, Apr 22, 2024 at 9:00 AM Christian König
> > > wrote:
> > >> Am 22.04.24 um 14:33 schrieb Qiang Ma:
> > >>> On Mon, 22
Fix Leo's address.
On Tue, Apr 23, 2024 at 10:33 PM Alex Deucher wrote:
>
> On Tue, Apr 23, 2024 at 10:04 PM Zhang, Jesse(Jie)
> wrote:
> >
> > [AMD Official Use Only - General]
> >
> > Hi Alex,
> >
> > -Original Message-
> > From: Alex Deucher
> > Sent: Wednesday, April 24, 2024 9:46
On Tue, Apr 23, 2024 at 10:04 PM Zhang, Jesse(Jie) wrote:
>
> [AMD Official Use Only - General]
>
> Hi Alex,
>
> -Original Message-
> From: Alex Deucher
> Sent: Wednesday, April 24, 2024 9:46 AM
> To: Zhang, Jesse(Jie)
> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> ; Koenig,
[AMD Official Use Only - General]
Hi Alex,
-Original Message-
From: Alex Deucher
Sent: Wednesday, April 24, 2024 9:46 AM
To: Zhang, Jesse(Jie)
Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
; Koenig, Christian ;
Huang, Tim
Subject: Re: [PATCH] drm/amdgpu: fix some
On Tue, Apr 23, 2024 at 9:27 PM Jesse Zhang wrote:
>
> Fix some variables not initialized before use.
> Scan them out using Synopsys tools.
>
> Signed-off-by: Jesse Zhang
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 5 +
>
This commit removes an unnecessary NULL check in the
`dcn10_set_input_transfer_func` function in the `dcn10_hwseq.c` file.
The variable `tf` is assigned the address of
`plane_state->in_transfer_func` unconditionally, so it can never be
`NULL`. Therefore, the check `if (tf == NULL)` is unnecessary
This commit removes a redundant NULL check in the
`dce110_set_input_transfer_func` function in the `dce110_hwseq.c` file.
The variable `tf` is assigned the address of
`plane_state->in_transfer_func` unconditionally, so it can never be
`NULL`. Therefore, the check `if (tf == NULL)` is unnecessary
This commit removes an unnecessary NULL check in the
`dcn10_set_input_transfer_func` function in the `dcn10_hwseq.c` file.
The variable `tf` is assigned the address of
`plane_state->in_transfer_func` unconditionally, so it can never be
`NULL`. Therefore, the check `if (tf == NULL)` is unnecessary
Fix some variables not initialized before use.
Scan them out using Synopsys tools.
Signed-off-by: Jesse Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 5 +
drivers/gpu/drm/amd/amdgpu/atom.c | 1 +
On 2024-04-23 11:28, Philip Yang wrote:
RDMA device with limited scatter-gather ability requires contiguous VRAM
buffer allocation for RDMA peer direct support.
Add a new KFD alloc memory flag and store as bo alloc flag
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA
On 2024-04-23 11:28, Philip Yang wrote:
When TTM failed to alloc VRAM, TTM try evict BOs from VRAM to system
memory then retry the allocation, this skips the KFD BOs from the same
process because KFD require all BOs are resident for user queues.
If TTM with TTM_PL_FLAG_CONTIGUOUS flag to alloc
On 2024-04-23 11:28, Philip Yang wrote:
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to
system memory first to free the VRAM space, then allocate contiguous
VRAM space, and then move it from system memory back to VRAM.
Signed-off-by: Philip Yang
---
On 2024-04-23 11:28, Philip Yang wrote:
TTM allocate contiguous VRAM may takes more than 1 second to evict BOs
for larger size RDMA buffer. Because KFD restore bo worker reserves all
KFD BOs, then TTM cannot hold the remainning KFD BOs lock to evict them,
this causes TTM failed to alloc
On 2/13/24 3:43 PM, Joao Paulo Pereira da Silva wrote:
From: jppaulo
Clean some wrong indenting that throw errors in checkpatch.
Signed-off-by: Joao Paulo Pereira da Silva
---
drivers/gpu/drm/amd/display/dc/core/dc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff
On 2024-04-22 05:10, Lang Yu wrote:
Observed on gfx8 ASIC when KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used.
Two attachments use the same VM, root PD would be locked twice.
[ 57.910418] Call Trace:
[ 57.793726] ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu]
[ 57.793820]
On 2/22/24 7:19 AM, David Tadokoro wrote:
In the header file dc/dcn301/dcn301_dccg.h, the function dccg301_create
is declared twice, so remove duplication.
Signed-off-by: David Tadokoro
---
drivers/gpu/drm/amd/display/dc/dcn301/dcn301_dccg.h | 6 --
1 file changed, 6 deletions(-)
On 2024-04-23 14:56, Mukul Joshi wrote:
Subtract the VRAM pinned memory when checking for available memory
in amdgpu_amdkfd_reserve_mem_limit function since that memory is not
available for use.
Signed-off-by: Mukul Joshi
Reviewed-by: Felix Kuehling
---
On 4/12/24 10:39 AM, Melissa Wen wrote:
According to [1]:
```
DTN only logs 'pipe_count' instances of MPCC. However in some cases
there are different number of MPCC than DPP (pipe_count).
```
As DTN log still relies on pipe_count to print mpcc state, switch to
mpcc_count in all occurrences.
tree/branch:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: a59668a9397e7245b26e9be85d23f242ff757ae8 Add linux-next specific
files for 20240423
Error/Warning reports:
https://lore.kernel.org/oe-kbuild-all/202404231839.ohiy9lw8-...@intel.com
Error
[AMD Official Use Only - General]
Reviewed-by: Ruijing Dong
Thanks,
Ruijing
-Original Message-
From: amd-gfx On Behalf Of Sonny Jiang
Sent: Tuesday, April 23, 2024 2:41 PM
To: amd-gfx@lists.freedesktop.org
Cc: Jiang, Sonny
Subject: [PATCH] drm/amdgpu: update fw_share for VCN5
Subtract the VRAM pinned memory when checking for available memory
in amdgpu_amdkfd_reserve_mem_limit function since that memory is not
available for use.
Signed-off-by: Mukul Joshi
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff
kmd_fw_shared changed in VCN5
Signed-off-by: Sonny Jiang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 5 -
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 10 ++
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c | 14 +++---
3 files changed, 21 insertions(+), 8 deletions(-)
diff --git
On 2024-04-23 01:50, Christian König wrote:
Am 22.04.24 um 21:45 schrieb Yunxiang Li:
Reset request from KFD is missing a check for if a reset is already in
progress, this causes a second reset to be triggered right after the
previous one finishes. Add the check to align with the other reset
On 4/22/24 8:35 AM, Jose Fernandez wrote:
When slice_height is 0, the division by slice_height in the calculation
of the number of slices will cause a division by zero driver crash. This
leaves the kernel in a state that requires a reboot. This patch adds a
check to avoid the division by
Am 23.04.24 um 16:44 schrieb Yunxiang Li:
Some times a hang GPU causes multiple reset source to schedule resets,
if the second source schedule after we call
amdgpu_device_stop_pending_resets they will be able to trigger an
unnecessary reset.
Move amdgpu_device_stop_pending_resets to after the
Am 23.04.24 um 16:31 schrieb Tim Huang:
From: Tim Huang
Clear warning that uses uninitialized value fw_size.
Signed-off-by: Tim Huang
Reviewed-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git
On 2024-04-23 09:59, Srinivasan Shanmugam wrote:
> This commit removes an unnecessary NULL check in the
> `dcn20_set_input_transfer_func` function in the `dcn20_hwseq.c` file.
> The variable `tf` is assigned the address of
> `plane_state->in_transfer_func` unconditionally, so it can never be
>
Bump the kfd ioctl minor version to delcare the contiguous VRAM
allocation flag support.
Signed-off-by: Philip Yang
---
include/uapi/linux/kfd_ioctl.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to
system memory first to free the VRAM space, then allocate contiguous
VRAM space, and then move it from system memory back to VRAM.
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 16
Define macro MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length
is unsigned int, and some users of it cast to a signed int, so every
segment of sg table is limited to size 2GB maximum.
For contiguous VRAM allocation, don't limit the max buddy block size in
order to get contiguous VRAM
When TTM failed to alloc VRAM, TTM try evict BOs from VRAM to system
memory then retry the allocation, this skips the KFD BOs from the same
process because KFD require all BOs are resident for user queues.
If TTM with TTM_PL_FLAG_CONTIGUOUS flag to alloc contiguous VRAM, allow
TTM evict KFD BOs
TTM allocate contiguous VRAM may takes more than 1 second to evict BOs
for larger size RDMA buffer. Because KFD restore bo worker reserves all
KFD BOs, then TTM cannot hold the remainning KFD BOs lock to evict them,
this causes TTM failed to alloc contiguous VRAM.
Increase the KFD restore BO wait
RDMA device with limited scatter-gather ability requires contiguous VRAM
buffer allocation for RDMA peer direct support.
Add a new KFD alloc memory flag and store as bo alloc flag
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA
peerdirect access, this will set
This patch series implement new KFD memory alloc flag for best effort contiguous
VRAM allocation, to support peer direct access RDMA device with limited
scatter-gather
dma capability.
v2: rebase on patch ("drm/amdgpu: Modify the contiguous flags behaviour")
to avoid adding the new GEM flag
Some times a hang GPU causes multiple reset source to schedule resets,
if the second source schedule after we call
amdgpu_device_stop_pending_resets they will be able to trigger an
unnecessary reset.
Move amdgpu_device_stop_pending_resets to after the reset is already
done, since any reset
From: Tim Huang
Clear warning that uses uninitialized value fw_size.
Signed-off-by: Tim Huang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
[AMD Official Use Only - General]
-Original Message-
From: Koenig, Christian
Sent: Tuesday, April 23, 2024 7:30 PM
To: Huang, Tim ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander
Subject: Re: [PATCH v2] drm/amdgpu: fix uninitialized scalar variable warning
Am 23.04.24 um 10:43
On 22/04/2024 13:10, Jani Nikula wrote:
> Surprisingly many places depend on debugfs.h to be included via
> drm_print.h. Fix them.
>
> v3: Also fix armada, ite-it6505, imagination, msm, sti, vc4, and xe
>
> v2: Also fix ivpu and vmwgfx
>
> Reviewed-by: Andrzej Hajda
> Acked-by: Maxime Ripard
On Tue, Apr 23, 2024 at 9:58 AM Christian König
wrote:
>
> Am 23.04.24 um 15:18 schrieb Alex Deucher:
> > On Tue, Apr 23, 2024 at 2:57 AM Christian König
> > wrote:
> >> Am 22.04.24 um 16:37 schrieb Alex Deucher:
> >>> As we use wb slots more dynamically, we need to lock
> >>> access to avoid
This commit removes an unnecessary NULL check in the
`dcn20_set_input_transfer_func` function in the `dcn20_hwseq.c` file.
The variable `tf` is assigned the address of
`plane_state->in_transfer_func` unconditionally, so it can never be
`NULL`. Therefore, the check `if (tf == NULL)` is unnecessary
Am 23.04.24 um 15:18 schrieb Alex Deucher:
On Tue, Apr 23, 2024 at 2:57 AM Christian König
wrote:
Am 22.04.24 um 16:37 schrieb Alex Deucher:
As we use wb slots more dynamically, we need to lock
access to avoid racing on allocation or free.
Wait a second. Why are we using the wb slots
On 2024-04-23 09:32, Christian König
wrote:
Am
23.04.24 um 15:04 schrieb Philip Yang:
To test RDMA using dummy driver on the
system without NIC/RDMA
device, the get/put dma pages pass in null device pointer, skip
Am 23.04.24 um 15:04 schrieb Philip Yang:
To test RDMA using dummy driver on the system without NIC/RDMA
device, the get/put dma pages pass in null device pointer, skip the
dma map/unmap resource and sg table to avoid null pointer access.
Well just to make it clear this patch is really a no-go
Am 23.04.24 um 15:04 schrieb Philip Yang:
Define macro MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length
is unsigned int, and some users of it cast to a signed int, so every
segment of sg table is limited to size 2GB maximum.
For contiguous VRAM allocation, don't limit the max buddy
On Tue, Apr 23, 2024 at 2:57 AM Christian König
wrote:
>
> Am 22.04.24 um 16:37 schrieb Alex Deucher:
> > As we use wb slots more dynamically, we need to lock
> > access to avoid racing on allocation or free.
>
> Wait a second. Why are we using the wb slots dynamically?
>
See patch 2. I needed
RDMA device with limited scatter-gather ability requires contiguous VRAM
buffer allocation for RDMA peer direct support.
Add a new KFD alloc memory flag and store as bo alloc flag
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA
peerdirect access, this will set
When TTM failed to alloc VRAM, TTM try evict BOs from VRAM to system
memory then retry the allocation, this skips the KFD BOs from the same
process because KFD require all BOs are resident for user queues.
If TTM with TTM_PL_FLAG_CONTIGUOUS flag to alloc contiguous VRAM, allow
TTM evict KFD BOs
Bump the kfd ioctl minor version to delcare the contiguous VRAM
allocation flag support.
Signed-off-by: Philip Yang
---
include/uapi/linux/kfd_ioctl.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to
system memory first to free the VRAM space, then allocate contiguous
VRAM space, and then move it from system memory back to VRAM.
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 16
To test RDMA using dummy driver on the system without NIC/RDMA
device, the get/put dma pages pass in null device pointer, skip the
dma map/unmap resource and sg table to avoid null pointer access.
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 33
TTM allocate contiguous VRAM may takes more than 1 second to evict BOs
for larger size RDMA buffer. Because KFD restore bo worker reserves all
KFD BOs, then TTM cannot hold the remainning KFD BOs lock to evict them,
this causes TTM failed to alloc contiguous VRAM.
Increase the KFD restore BO wait
Define macro MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length
is unsigned int, and some users of it cast to a signed int, so every
segment of sg table is limited to size 2GB maximum.
For contiguous VRAM allocation, don't limit the max buddy block size in
order to get contiguous VRAM
This patch series implement new KFD memory alloc flag for best effort contiguous
VRAM allocation, to support peer direct access RDMA device with limited
scatter-gather
dma capability.
v2: rebase on patch ("drm/amdgpu: Modify the contiguous flags behaviour")
to avoid adding the new GEM flag
Hi,
On 23/04/24 17:53, Dmitry Baryshkov wrote:
On Tue, 23 Apr 2024 at 13:24, Maíra Canal wrote:
On 4/23/24 01:02, Vignesh Raman wrote:
Uprev IGT to the latest version and stop vendoring the
testlist into the kernel. Instead, use the testlist from
the IGT build to ensure we do not miss
On Tue, 23 Apr 2024 at 13:24, Maíra Canal wrote:
>
> On 4/23/24 01:02, Vignesh Raman wrote:
> > Uprev IGT to the latest version and stop vendoring the
> > testlist into the kernel. Instead, use the testlist from
> > the IGT build to ensure we do not miss renamed or newly
> > added tests.
>
>
From: Alex Deucher
[ Upstream commit 781d41fed19caf900c8405064676813dc9921d32 ]
Convert a variable sized array from [1] to [].
v2: fix up a few more.
v3: integrate comments from Kees.
Reviewed-by: Kees Cook
Tested-by: Jeff Johnson (v2)
Acked-by: Christian König (v1)
Signed-off-by: Alex
From: Alex Deucher
[ Upstream commit 781d41fed19caf900c8405064676813dc9921d32 ]
Convert a variable sized array from [1] to [].
v2: fix up a few more.
v3: integrate comments from Kees.
Reviewed-by: Kees Cook
Tested-by: Jeff Johnson (v2)
Acked-by: Christian König (v1)
Signed-off-by: Alex
Am 23.04.24 um 11:15 schrieb Bob Zhou:
if the sdma_v4_0_irq_id_to_seq return -EINVAL, the process should
be stop to avoid out-of-bounds read, so directly return -EINVAL.
Signed-off-by: Bob Zhou
Acked-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 3 +++
1 file
Am 23.04.24 um 10:43 schrieb Tim Huang:
From: Tim Huang
Clear warning that uses uninitialized value fw_size.
Signed-off-by: Tim Huang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git
This reverts commit 925c7bd1d1cf9f173b22603c8bd4816d142d4935.
RCCL library is currently not treating spatial partitions differently,
hence this change is causing issues. Revert temporarily till RCCL
implementation is ready for spatial partitions.
Signed-off-by: Lijo Lazar
---
Am 23.04.24 um 10:12 schrieb Huang, Tim:
[AMD Official Use Only - General]
-Original Message-
From: amd-gfx On Behalf Of Huang, Tim
Sent: Tuesday, April 23, 2024 4:01 PM
To: Koenig, Christian ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander
Subject: RE: [PATCH] drm/amdgpu: fix
The problem is that it's a hit all case and that's usually seen as bad
coding style.
In other words when one branch by accident forgets to set the fw_size we
wouldn't get a warning any more and just use zero.
Please rather add setting the fw_size to zero to the default branch and
maybe even
Current MES GFX mask prevents FW to enable oversubscription. This patch
does the following:
- Fixes the mask values and adds a description for the same
- Removes the central mask setup and makes it IP specific, as it would
be different when the number of pipes and queues are different.
Cc:
if the sdma_v4_0_irq_id_to_seq return -EINVAL, the process should
be stop to avoid out-of-bounds read, so directly return -EINVAL.
Signed-off-by: Bob Zhou
---
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
From: Tim Huang
Clear warning that uses uninitialized value fw_size.
Signed-off-by: Tim Huang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
[Public]
Hi, Christian
The basic idea is to collect the following performance data and export this raw
data into a centralized debugfs. This raw data may help in performance tuning
from the AMDGPU kernel driver side. Additionally, this performance data should
be easily used for tool libraries
[AMD Official Use Only - General]
Hi Christian
Agree with you, returning an error is surely a better modification.
I will send v2 patch to fix this.
Regards,
Bob
-Original Message-
From: Koenig, Christian
Sent: 2024年4月23日 15:41
To: Zhou, Bob ; Koenig, Christian ;
because the ue valid mca count will only be cleared after gpu reset,
so only dump mca log on the first time to get mca bank after receive RAS
interrupt.
Signed-off-by: Yang Wang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c | 28 +
drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h
v1:
because SMU CE valid mca bank will be cleared after reading,
this patch adds mca cache at the driver level to ensure that the mca bank is
not lost.
v2:
refine amdgpu_mca_init/fini/reset() function name.
v3:
add mca_cache.lock support
only add CE bank to mca bank cache.
Signed-off-by: Yang
- Refine mca driver code.
- Centralize mca bank dispatch code logic.
Signed-off-by: Yang Wang
Reviewed-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c | 97 ++---
1 file changed, 55 insertions(+), 42 deletions(-)
diff --git
- remove unused callback functions.
- make part of mca functions static and refine the function order.
Signed-off-by: Yang Wang
Reviewed-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c | 199 --
drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h | 16 --
[AMD Official Use Only - General]
-Original Message-
From: amd-gfx On Behalf Of Huang, Tim
Sent: Tuesday, April 23, 2024 4:01 PM
To: Koenig, Christian ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander
Subject: RE: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning
[AMD
[AMD Official Use Only - General]
Hi Christian,
-Original Message-
From: Koenig, Christian
Sent: Tuesday, April 23, 2024 3:43 PM
To: Huang, Tim ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Koenig, Christian
Subject: Re: [PATCH] drm/amdgpu: fix uninitialized scalar variable
…
> +++ b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c
> @@ -1055,7 +1055,12 @@ static bool setup_dsc_config(
> if (!is_dsc_possible)
> goto done;
>
> - dsc_cfg->num_slices_v = pic_height/slice_height;
> + if (slice_height > 0)
> + dsc_cfg->num_slices_v =
When slice_height is 0, the division by slice_height in the calculation
of the number of slices will cause a division by zero driver crash. This
leaves the kernel in a state that requires a reboot. This patch adds a
check to avoid the division by zero.
The stack trace below is for the 6.8.4
On Mon, 22 Apr 2024 14:59:36 +0200
Christian König wrote:
> Am 22.04.24 um 14:33 schrieb Qiang Ma:
> > On Mon, 22 Apr 2024 11:40:26 +0200
> > Christian König wrote:
> >
> >> Am 22.04.24 um 07:26 schrieb Qiang Ma:
> >>> Some boards(like Oland PRO: 0x1002:0x6613) seem to have
> >>> garbage in
Am 23.04.24 um 08:28 schrieb Tim Huang:
Clear warning that uses uninitialized value fw_size.
In which case is the fw_size uninitialized and why setting it to zero
helps in that case?
Regards,
Christian.
Signed-off-by: Tim Huang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +-
1
In this case we should modify amdgpu_i2c_get_byte() to return an error
and prevent writing the value back.
See zero is as random as any other value and initializing the variable
here doesn't really help, it just makes your warning disappear.
Regards,
Christian.
Am 23.04.24 um 08:27 schrieb
[Public]
Thanks for your comments.
I should clarify the issue. As you see the amdgpu_i2c_get_byte code:
if (i2c_transfer(_bus->adapter, msgs, 2) == 2) {
*val = in_buf[0];
DRM_DEBUG("val = 0x%02x\n", *val);
} else {
Clear warning that uses uninitialized value fw_size.
Signed-off-by: Tim Huang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index
[AMD Official Use Only - General]
Thanks for your comments.
I should clarify the issue. As you see the amdgpu_i2c_get_byte code:
if (i2c_transfer(_bus->adapter, msgs, 2) == 2) {
*val = in_buf[0];
DRM_DEBUG("val = 0x%02x\n", *val);
Am 22.04.24 um 21:59 schrieb Sonny Jiang:
From: Sonny Jiang
VCN5 session info package interface changed
Signed-off-by: Sonny Jiang
Mhm, in general we should push back on FW changes which makes stuff like
that necessary. So what is the justification?
If the FW has a good justification
Am 22.04.24 um 16:37 schrieb Alex Deucher:
As we use wb slots more dynamically, we need to lock
access to avoid racing on allocation or free.
Wait a second. Why are we using the wb slots dynamically?
The number of slots made available is statically calculated, when this
is suddenly used
Am 23.04.24 um 04:53 schrieb Ma, Jun:
unsigned int client_id, src_id;
struct amdgpu_irq_src *src;
bool handled = false;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 924baf58e322..f0a63d084b4d 100644
---
[AMD Official Use Only - General]
OK
-
Best Regards,
Thomas
-Original Message-
From: Zhang, Hawking
Sent: Tuesday, April 23, 2024 11:27 AM
To: Chai, Thomas ; amd-gfx@lists.freedesktop.org
Cc: Zhou1, Tao ; Li, Candice ; Wang,
Yang(Kevin) ; Yang, Stanley
Subject: RE:
Am 23.04.24 um 07:33 schrieb Bob Zhou:
Because the val isn't initialized, a random variable is set by
amdgpu_i2c_put_byte.
So fix the uninitialized issue.
Well that isn't correct. See the code here:
amdgpu_i2c_get_byte(amdgpu_connector->router_bus,
95 matches
Mail list logo