> -Original Message-
> From: amd-gfx On Behalf Of
> Zhang, Jerry(Junwei)
> Sent: 2019年1月9日 9:39
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org
> Cc: Li, Yukun1
> Subject: Re: [PATCH] drm/amdgpu: fix CPDMA hang in PRT mode for VEGA20
>
> On 1/8/19 6:55
Reviewed-by: Tao Zhou
Tao
> -Original Message-
> From: amd-gfx On Behalf Of Alex
> Deucher
> Sent: 2019年3月30日 3:16
> To: amd-gfx@lists.freedesktop.org; airl...@gmail.com
> Cc: Deucher, Alexander
> Subject: [PATCH] drm/amdgpu/smu11: fix warning on 32bit arches
>
> Fixes
> warning:
Reviewed-by: Tao Zhou
> -Original Message-
> From: Evan Quan
> Sent: 2019年5月5日 11:20
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhou1, Tao ; Quan, Evan
> Subject: [PATCH] drm/amd/powerplay: check for invalid profile_exit setting
>
> profile_exit performance lev
Referring to the series, patch #1 and #2.
Regards,
Tao
> -Original Message-
> From: Grodzovsky, Andrey
> Sent: 2019年8月14日 10:42
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Pan, Xinhui
> ; Zhang, Hawking
> Subject: Re: [PATCH 1/2] drm
Hi Andrey:
I'm also working on ras error address saving based on your eeprom patches, and
the implementation is different from you.
I'll send out my patches this week and we can discuss it.
Regards,
Tao
> -Original Message-
> From: amd-gfx On Behalf Of
> Andrey Grodzovsky
> Sent:
Consider amdgpu_ras_error_query < 0 and !con are almost impossible, the patch
is:
Reviewed-by: Tao Zhou
> -Original Message-
> From: amd-gfx On Behalf Of
> Guchun Chen
> Sent: 2019年8月20日 10:25
> To: amd-gfx@lists.freedesktop.org; Zhang, Hawking
> ; Li, Dennis ;
en
> Sent: 2019年8月16日 15:10
> To: amd-gfx@lists.freedesktop.org; Zhang, Hawking
> ; Li, Dennis ; Pan, Xinhui
> ; Zhou1, Tao
> Cc: Chen, Guchun
> Subject: [PATCH] drm/amdgpu: correct return type of
> amdgpu_ras_query_error_count
>
> The return value type of amdgpu_ras_qu
> -Original Message-
> From: Andrey Grodzovsky
> Sent: 2019年8月22日 4:02
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Pan, Xinhui
> ; Zhang, Hawking ;
> Tuikov, Luben ; Lazar, Lijo ;
> Quan, Evan ; Panariti, David
> ; Russell, Kent ; Zhou1,
, the series is:
Reviewed-by: Tao Zhou
> -Original Message-
> From: Zhang, Hawking
> Sent: 2019年9月3日 8:02
> To: amd-gfx@lists.freedesktop.org; Zhou1, Tao ;
> Chen, Guchun ; Deucher, Alexander
>
> Cc: Zhang, Hawking
> Subject: [PATCH 01/10] drm/amdgpu: set ip specif
change r type from bool to int, suitable for both bool and int return
value
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index
The series is:
Reviewed-by: Tao Zhou
> -Original Message-
> From: Guchun Chen
> Sent: 2019年8月29日 16:59
> To: amd-gfx@lists.freedesktop.org; Zhang, Hawking
> ; Li, Dennis ; Koenig,
> Christian ; Deucher, Alexander
> ; Zhou1, Tao
> Cc: Li, Candice ; Chen, Guc
> -Original Message-
> From: amd-gfx On Behalf Of
> Andrey Grodzovsky
> Sent: 2019年8月29日 4:00
> To: amd-gfx@lists.freedesktop.org
> Cc: alexdeuc...@gmail.com; ckoenig.leichtzumer...@gmail.com;
> Grodzovsky, Andrey ; Zhang, Hawking
>
> Subject: [PATCH 1/2] dmr/amdgpu: Avoid HW GPU reset
> -Original Message-
> From: Hawking Zhang
> Sent: 2019年8月29日 21:30
> To: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> ; Zhou1, Tao ; Chen,
> Guchun
> Cc: Zhang, Hawking
> Subject: [PATCH 1/7] drm/amdgpu: add helper function to do common
With the two points in patch #1 and patch #5 are fixed, the series is:
Reviewed-by: Tao Zhou
> -Original Message-
> From: amd-gfx On Behalf Of
> Hawking Zhang
> Sent: 2019年8月29日 21:31
> To: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> ; Zhou1, Tao ; Chen,
>
> -Original Message-
> From: Andrey Grodzovsky
> Sent: 2019年8月30日 8:54
> To: amd-gfx@lists.freedesktop.org
> Cc: alexdeuc...@gmail.com; Zhang, Hawking ;
> ckoenig.leichtzumer...@gmail.com; Zhou1, Tao ;
> Grodzovsky, Andrey
> Subject: [PATCH v2 1/2] dmr/am
> -Original Message-
> From: Hawking Zhang
> Sent: 2019年8月29日 21:31
> To: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> ; Zhou1, Tao ; Chen,
> Guchun
> Cc: Zhang, Hawking
> Subject: [PATCH 5/7] drm/amdgpu: add mmhub ras_late_init callback
> func
> -Original Message-
> From: amd-gfx On Behalf Of
> Hawking Zhang
> Sent: 2019年8月26日 11:55
> To: amd-gfx@lists.freedesktop.org; Deucher, Alexander
>
> Cc: Zhang, Hawking
> Subject: [PATCH 7/8] drm/amdgpu: enable/disable ras_controller_irq and
> err_event_athub_irq
>
[Tao] need a
Patch #1 ~ #6 and patch #8 are:
Reviewed-by: Tao Zhou
> -Original Message-
> From: amd-gfx On Behalf Of
> Hawking Zhang
> Sent: 2019年8月26日 11:55
> To: amd-gfx@lists.freedesktop.org; Deucher, Alexander
>
> Cc: Zhang, Hawking
> Subject: [PATCH 1/8] drm/amdgpu: add new amdgpu nbio
> -Original Message-
> From: Chen, Guchun
> Sent: 2019年9月2日 10:13
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org;
> Grodzovsky, Andrey ; Li, Dennis
> ; Zhang, Hawking
> Cc: Zhou1, Tao
> Subject: RE: [PATCH 1/4] drm/amdgpu: change ras bps type to eeprom
> -Original Message-
> From: Chen, Guchun
> Sent: 2019年9月2日 10:11
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org;
> Grodzovsky, Andrey ; Li, Dennis
> ; Zhang, Hawking
> Cc: Zhou1, Tao
> Subject: RE: [PATCH 2/4] drm/amdgpu: Hook EEPROM table to RAS
>
&g
> -Original Message-
> From: Grodzovsky, Andrey
> Sent: 2019年8月30日 22:03
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org;
> Chen, Guchun ; Li, Dennis ;
> Zhang, Hawking
> Subject: Re: [PATCH 4/4] drm/amdgpu: move the call of ras recovery_init and
> bad page
From: Grodzovsky, Andrey
Sent: 2019年8月22日 23:07
To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Pan, Xinhui
; Zhang, Hawking ; Tuikov, Luben
; Lazar, Lijo ; Quan, Evan
; Panariti, David ; Russell, Kent
Subject: Re: [PATCH v4 1/4] drm/amdgpu: Add RAS EEPROM table
Another way is to add check for ih_info in amdgpu_ras_interrupt_add_handler and
amdgpu_ras_interrupt_remove_handler directly.
> -Original Message-
> From: amd-gfx On Behalf Of
> Zhou1, Tao
> Sent: 2019年8月29日 10:59
> To: Zhang, Hawking ; amd-
> g...@lists.freede
Can we also add a ras_late_init for umc?
> -Original Message-
> From: amd-gfx On Behalf Of
> Zhou1, Tao
> Sent: 2019年8月29日 11:41
> To: Zhang, Hawking ; amd-
> g...@lists.freedesktop.org; Deucher, Alexander
>
> Cc: Zhang, Hawking
> Subject: RE: [PATCH
> -Original Message-
> From: Hawking Zhang
> Sent: 2019年8月28日 21:03
> To: amd-gfx@lists.freedesktop.org; Zhou1, Tao ;
> Deucher, Alexander
> Cc: Zhang, Hawking
> Subject: [PATCH 1/7] drm/amdgpu: add helper function to do common
> ras_late_init
>
> In
> -Original Message-
> From: amd-gfx On Behalf Of
> Hawking Zhang
> Sent: 2019年8月28日 21:03
> To: amd-gfx@lists.freedesktop.org; Zhou1, Tao ;
> Deucher, Alexander
> Cc: Zhang, Hawking
> Subject: [PATCH 2/7] drm/amdgpu: switch to amdgpu_ras_late_init for gfx v9
> -Original Message-
> From: Hawking Zhang
> Sent: 2019年8月28日 21:03
> To: amd-gfx@lists.freedesktop.org; Zhou1, Tao ;
> Deucher, Alexander
> Cc: Zhang, Hawking
> Subject: [PATCH 5/7] drm/amdgpu: add mmhub ras_late_init callback
> function
>
> The functi
> -Original Message-
> From: Hawking Zhang
> Sent: 2019年8月28日 21:03
> To: amd-gfx@lists.freedesktop.org; Zhou1, Tao ;
> Deucher, Alexander
> Cc: Zhang, Hawking
> Subject: [PATCH 6/7] drm/amdgpu: add ras_late_init callback function for
> nbio v7_4
>
> r
umc late init is umc specific, it's more suitable to be put in umc block
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/Makefile | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 48
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 2 -
move umc ras init from ras module to umc block, generic ras module
should pay less attention to specific ras block.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 4
2 files changed, 4 insertions(+), 4 deletions(-)
this interface is related to specific version of umc, distinguish it
from ras_late_init
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 2 +-
drivers/gpu/drm/amd/amdgpu/umc_v6_1.c | 2 +-
3 files changed, 4 insertions(+),
It's better to add space after "*" in comment, with this fixed, the series is:
Reviewed-by: Tao Zhou
> -Original Message-
> From: Andrey Grodzovsky
> Sent: 2019年9月5日 10:50
> To: amd-gfx@lists.freedesktop.org
> Cc: alexdeuc...@gmail.com; Zhang, Hawking ;
&g
change bps type from retired page to eeprom table record, prepare for
saving umc error records to eeprom
Signed-off-by: Tao Zhou
Reviewed-by: Guchun Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 59 -
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 11 +++--
2 files
ras recovery_init should be called after ttm init,
bad page reserve should be put in front of gpu reset since i2c
may be unstable during gpu reset.
add cleanup for recovery_init and recovery_fini
v2: add more comment and print.
remove cancel_work_sync in recovery_init.
Signed-off-by: Tao
support eeprom records load and save for ras,
move EEPROM records storing to bad page reserving
v2: remove redundant check for con->eh_data
Signed-off-by: Tao Zhou
Signed-off-by: Andrey Grodzovsky
Reviewed-by: Guchun Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 109
save umc error records to ras bad page array
v2: add bad pages before gpu reset
v3: add NULL check for adev->umc.funcs
Signed-off-by: Tao Zhou
Signed-off-by: Andrey Grodzovsky
Reviewed-by: Guchun Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 2 +-
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> -Original Message-
> From: Chen, Guchun
> Sent: 2019年9月6日 18:01
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org;
> Zhang, Hawking
> Subject: RE: [PATCH 1/3] drm/amdgpu: move umc late init from gmc to umc
> block
>
>
>
> -----Original Messag
Chen, Guchun ; Zhou1, Tao
> ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander
> Subject: Re: [PATCH] drm/amdgpu: Fix mutex lock from atomic context.
>
> That not what I meant. Let's say you handled one bad page interrupt and as
> a result have one bad page reserved. Now unrela
There are two cases of reserve error should be ignored:
1) a ras bad page has been allocated (used by someone);
2) a ras bad page has been reserved (duplicate error injection for one page);
DRM_ERROR is unnecessary for the failure of bad page reserve
Signed-off-by: Tao Zhou
---
Reviewed-by: Tao Zhou
> -Original Message-
> From: Chen, Guchun
> Sent: 2019年9月10日 16:44
> To: amd-gfx@lists.freedesktop.org; Zhang, Hawking
> ; Zhou1, Tao ;
> Grodzovsky, Andrey
> Cc: Chen, Guchun
> Subject: [PATCH] drm/amdgpu: remove duplicated header file i
Reviewed-by: Tao Zhou mailto:tao.zh...@amd.com>>
From: Yin, Tianci (Rico)
Sent: 2019年9月10日 16:59
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Zhou1, Tao ; Xu,
Feifei ; Long, Gang
Subject: [Patch] drm/amdgpu: fix CPDMA hang in PRT mode for
To: amd-gfx@lists.freedesktop.org
> Cc: Chen, Guchun ; Zhou1, Tao
> ; Deucher, Alexander
> ; Grodzovsky, Andrey
>
> Subject: [PATCH] drm/amdgpu: Fix mutex lock from atomic context.
>
> Problem:
> amdgpu_ras_reserve_bad_pages was moved to amdgpu_ras_reset_gpu
> because writing t
umc late init is umc specific, it's more suitable to be put in umc block
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/Makefile | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 48
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 2 -
move umc ras init from ras module to umc block, generic ras module
should pay less attention to specific ras block.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 4
2 files changed, 4 insertions(+), 4 deletions(-)
this interface is related to specific version of umc, distinguish it
from ras_late_init
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 2 +-
drivers/gpu/drm/amd/amdgpu/umc_v6_1.c | 8
3 files changed, 7
> -Original Message-
> From: Andrey Grodzovsky
> Sent: 2019年9月10日 4:04
> To: amd-gfx@lists.freedesktop.org
> Cc: Chen, Guchun ; Zhou1, Tao
> ; Deucher, Alexander
> ; Grodzovsky, Andrey
>
> Subject: [PATCH 2/2] drm/amdgpu: Allow to reset to EERPOM table.
&
Reviewed-by: Tao Zhou
> -Original Message-
> From: amd-gfx On Behalf Of Alex
> Deucher
> Sent: 2019年9月17日 3:51
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander
> Subject: [PATCH] drm/amdgpu/ras: use GPU PAGE_SIZE/SHIFT for reserving
> pages
>
> We are reserving vram pages
There are two cases of reserve error should be ignored:
1) a ras bad page has been allocated (used by someone);
2) a ras bad page has been reserved (duplicate error injection for one page);
DRM_ERROR is unnecessary for the failure of bad page reserve
Signed-off-by: Tao Zhou
---
replace offset with size
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index ecad84e1b4e2..2fcd2d14cbf0 100644
> -Original Message-
> From: Chen, Guchun
> Sent: 2019年9月17日 14:52
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org;
> Zhang, Hawking
> Subject: RE: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in
> ras_reserve_bad_pages
>
>
>
> -----Origina
the bo pointer is reused for bad pages, initialize it in each loop
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
umc retired page belongs to vram and it should be aligned to gpu page
size
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/umc_v6_1.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
index
ame it to umc.funcs->ras_hw_init?
Regards,
Tao
> -Original Message-
> From: Zhang, Hawking
> Sent: 2019年9月9日 6:40
> To: Zhang, Hawking ; Zhou1, Tao
> ; amd-gfx@lists.freedesktop.org; Chen, Guchun
>
> Subject: RE: [PATCH 3/3] drm/amdgpu: rename umc ras_init to ras_asic_init
>
"control = >eeprom_control" is suggested, apart from this, the series is:
Reviewed-by: Tao Zhou
> -Original Message-
> From: Chen, Guchun
> Sent: 2019年9月18日 11:38
> To: amd-gfx@lists.freedesktop.org; Zhang, Hawking
> ; Zhou1, Tao ;
> Grodzovsky, Andrey
gmc_ras_fini can be shared among all generations of gmc
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 26 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 +
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 28 +
3 files changed, 28
gfx_ras_late_init can get the info by itself
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 16 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 3 +--
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 5 +
3 files changed, 9 insertions(+), 15 deletions(-)
diff
umc_ras_late_init can get the info by itself
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 15 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 4 ++--
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 5 +
3 files changed, 10 insertions(+), 14 deletions(-)
diff
sdma_ras_fini can be shared among all generations of sdma
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 19 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 1 +
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 16 +---
3 files changed, 21
it's more suitable to put umc ras fini in umc block
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 12 +---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 15 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 1 +
3 files changed, 17 insertions(+), 11
simplify the code of accessing to eeprom_control struct
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
add a common nbio ras fini implementation to cleanup nbio ras framework
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c | 14 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h | 2 +-
drivers/gpu/drm/amd/amdgpu/soc15.c | 1 +
3 files changed, 16 insertions(+),
common gmc_ecc_late_init can be shared among all generations of gmc
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 19 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 +
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 22 +-
3 files changed,
it's more suitable to put xgmi ras fini in xgmi block
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 13 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 14 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 1 +
3 files changed, 17 insertions(+), 11
some refinements for RAS, no functional change:
1. make more ras code can be reusable among different generations of ras
block;
2. make some ras code simpler;
Tao Zhou (21):
drm/amdgpu: update parameter of ras_ih_cb
drm/amdgpu: move umc ras irq functions to umc block
drm/amdgpu: move gfx
move umc ras irq functions from gmc v9 to generic umc block
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 65 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 6 +++
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 68 +
3 files
umc_ras_if is relevant to umc
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 -
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 28 -
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 1 +
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 4 ++--
4 files changed,
put mmhub_funcs and ras_if pointer into mmhub struct
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.h | 5 +
2 files changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
simplify code logic and refine return value
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 32 ++
1 file changed, 17 insertions(+), 15 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index
gfx_ras_fini can be shared among all generations of gfx
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 15 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 1 +
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 14 +-
3 files changed, 17 insertions(+), 13
add ras fini for xgmi to cleanup xgmi ras framework
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 11 +++
1 file changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index
mmhub_ras_if is relevant to mmhub
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 -
drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.c | 24 +++
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 4 ++--
3 files changed, 14 insertions(+), 15 deletions(-)
diff
it's more suitable to put mmhub ras fini in mmhub block
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 12 +---
drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.c | 14 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.h | 2 +-
3 files changed, 16 insertions(+),
remove mmhub_funcs in adev
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 -
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 6 +++---
3 files changed, 5 insertions(+), 6 deletions(-)
diff --git
gfx ras ecc common functions could be reused among all gfx generations
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 33
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 6
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 41 ++---
3
sdma ras ecc functions can be reused among all sdma generations
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 28
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 6 +
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 24 ++--
3 files
change struct ras_err_data *err_data to void *err_data, align with the
implementation of umc code and the callback's declaration in each ras
block could pay no attention to the structure type
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 2 +-
The series is:
Reviewed-by: Tao Zhou
> -Original Message-
> From: amd-gfx On Behalf Of
> Guchun Chen
> Sent: 2019年8月7日 14:52
> To: amd-gfx@lists.freedesktop.org; Zhang, Hawking
> ; Li, Dennis ; Pan, Xinhui
> ; Zhou1, Tao
> Cc: Chen, Guchun
> Subject: [P
Reviewed-by: Tao Zhou
> -Original Message-
> From: amd-gfx On Behalf Of
> Guchun Chen
> Sent: 2019年8月8日 14:59
> To: amd-gfx@lists.freedesktop.org; Zhang, Hawking
> ; Li, Dennis ; Pan, Xinhui
> ; Zhou1, Tao
> Cc: Chen, Guchun
> Subject: [PATCH] drm/amdgp
age-
> From: Zhang, Hawking
> Sent: 2019年8月8日 15:16
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org;
> Chen, Guchun ; Li, Dennis ;
> Pan, Xinhui ; Clements, John
>
> Cc: Zhou1, Tao
> Subject: RE: [PATCH 1/3] drm/amdgpu: add amdgpu_mmhub_funcs
> definition
>
> F
> -Original Message-
> From: Chen, Guchun
> Sent: 2019年8月1日 16:22
> To: Zhang, Hawking ; Zhou1, Tao
> ; amd-gfx@lists.freedesktop.org; Li, Dennis
> ; Pan, Xinhui
> Cc: Zhou1, Tao
> Subject: RE: [PATCH 0/4] enable umc ras ce interrupt
>
> 1) Patch 1,
EccErrInt field of EccErrCntSel specifies the type of interrupt, it's not
threshold. But my comment for the code is not proper, I'll update the comment.
> -Original Message-
> From: Zhang, Hawking
> Sent: 2019年8月1日 15:52
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org;
> -Original Message-
> From: Koenig, Christian
> Sent: 2019年8月9日 14:42
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org;
> Deucher, Alexander ; Zhang, Hawking
>
> Subject: Re: [PATCH 1/2] drm/amdgpu: implement UMC 64 bits REG
> operations
>
> Am 09.0
Please change all "eject" to "inject" in commit subject and description.
-Original Message-
From: amd-gfx On Behalf Of Guchun Chen
Sent: 2019年8月6日 15:36
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; Li,
Dennis ; Pan, Xinhui ; Zhou1, Tao
Cc: Chen, Guchun
Subj
-Original Message-
From: amd-gfx On Behalf Of Guchun Chen
Sent: 2019年8月6日 15:36
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; Li,
Dennis ; Pan, Xinhui ; Zhou1, Tao
Cc: Li, Dennis ; Chen, Guchun
Subject: [PATCH libdrm 2/3] tests/amdgpu/ras: refine ras eject test
Ras eject test
the info of retired page's bo may be lost if vram lost is encountered
in gpu reset (gpu page table in vram may be lost), force to recreate
all bos
Signed-off-by: Tao Zhou
Suggested-by: Andrey Grodzovsky
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
> -Original Message-
> From: Chen, Guchun
> Sent: 2019年9月29日 15:06
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org;
> Grodzovsky, Andrey ; Zhang, Hawking
>
> Subject: RE: [PATCH] drm/amdgpu: recreate retired page's bo if vram get lost
> in gpu reset
>
> -Original Message-
> From: Chen, Guchun
> Sent: 2019年9月30日 11:26
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org;
> Grodzovsky, Andrey ; Zhang, Hawking
>
> Subject: RE: [PATCH 3/3] drm/amdgpu: reuse code of ras bad page's bo
> create
>
>
> Reg
the info of retired page's bo may be lost if vram lost is encountered
in gpu reset (gpu page table in vram may be lost), force to recreate
all bos
v2: simplify NULL pointer check
add more comments
Signed-off-by: Tao Zhou
Suggested-by: Andrey Grodzovsky
---
guarantee bo pointers in bad page bo array are NULL after allocation
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
implement ras_create_bad_pages_bo to simplify ras code
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 72 +++--
1 file changed, 31 insertions(+), 41 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> -Original Message-
> From: Chen, Guchun
> Sent: 2019年9月30日 15:14
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org;
> Zhang, Hawking
> Subject: RE: [PATCH] drm/amdgpu: avoid ras error injection for retired page
>
>
>
>
> Regards,
> Guchun
&g
check whether a page is bad page before error injection
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 38 +
1 file changed, 38 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
allocation information will not be lost in gpu reset according to
Christian's comments. Do you have any other concern?
Regards,
Tao
> -Original Message-
> From: Christian König
> Sent: 2019年9月30日 16:35
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org;
> Grodzovsky, Andrey
Two questions:
1. "we lose all reservation during ASIC reset"
Are you sure of it? I remember the content of vram may be lost after reset but
the allocated status could be reserved.
2. You change the bad page handle flow from:
detect bad page -> reserve vram for bad page -> save bad page info
ta->count) pages should be reserved.
Tao
> -Original Message-
> From: Grodzovsky, Andrey
> Sent: 2019年11月15日 0:42
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org;
> Koenig, Christian
> Cc: alexdeuc...@gmail.com; Chen, Guchun ;
> Zhang, Hawking
> Subject: Re:
; From: Christian König
> Sent: 2019年11月20日 19:27
> To: Zhu, Changfeng ; Koenig, Christian
> ; Xiao, Jack ; Zhou1, Tao
> ; Huang, Ray ; Huang,
> Shimmer ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 2/2] drm/amdgpu: invalidate mmhub semphore
> workaround in gmc9/gmc1
> -Original Message-
> From: Le Ma
> Sent: 2019年11月27日 17:15
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Chen, Guchun
> ; Zhou1, Tao ; Li, Dennis
> ; Deucher, Alexander
> ; Ma, Le
> Subject: [PATCH 05/10] drm/amdgpu: enable/disable doorbell int
I'll add a new patch.
Regards,
Tao
> -Original Message-
> From: Zhang, Hawking
> Sent: 2019年9月19日 22:48
> To: Chen, Guchun ; Zhou1, Tao
> ; amd-gfx@lists.freedesktop.org
> Subject: RE: [PATCH 05/21] drm/amdgpu: refine sdma4 ras_data_cb
>
> Let's add comments to
> -Original Message-
> From: Chen, Guchun
> Sent: 2019年9月19日 22:11
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org;
> Zhang, Hawking
> Subject: RE: [PATCH 11/21] drm/amdgpu: add common gfx_ras_fini function
>
>
> -Original Message-
> From
> -Original Message-
> From: Chen, Guchun
> Sent: 2019年9月19日 21:59
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org;
> Zhang, Hawking
> Subject: RE: [PATCH 05/21] drm/amdgpu: refine sdma4 ras_data_cb
>
>
>
>
> Regards,
> Guchun
>
> ---
1 - 100 of 344 matches
Mail list logo