在 2018年11月28日,00:11,Alex Deucher <[email protected]> 写道:
>
> On Tue, Nov 27, 2018 at 4:56 AM Christian König
> <[email protected]> wrote:
>>
>> Am 27.11.18 um 02:47 schrieb Zhang, Jerry(Junwei):
>>
>> On 11/26/18 5:28 PM, Christian König wrote:
>>
>> Am 26.11.18 um 03:38 schrieb Zhang, Jerry(Junwei):
>>
>> On 11/24/18 3:32 AM, Deucher, Alexander wrote:
>>
>> Is this required? Are the harvesting fuses incorrect? If the blocks are
>> harvested, we should bail out of the blocks properly during init. Also,
>> please make this more explicit if we still need it. E.g.,
>>
>>
>>
>> The harvest fuse is indeed disabling UVD and VCE, as it's a mining card.
>> Then any command to UVD/VCE causing NULL pointer issue, like amdgpu_test.
>>
>>
>> In this case we should fix the NULL pointer issue instead. Do you have a
>> backtrace for this?
>>
>>
>> Sorry to miss the detail.
>> The NULL pointer is caused by UVD is not initialized as it's disabled in
>> VBIOS for this kind of card.
>>
>>
>> Yeah, but that should be handled correctly.
>>
>>
>> When cs submit, it will check ring->funcs->parse_cs in amdgpu_cs_ib_fill().
>> However, uvd_v6_0_early_init() skip the set ring function, as
>> CC_HARVEST_FUSES is set UVD/VCE disabled.
>> Then the access to UVD/VCE ring's funcs will cause NULL pointer issue.
>>
>> BTW, Windows driver disables UVD/VCE for it as well.
>>
>>
>> You are approaching this from the wrong side. The fact that UVD/VCE is
>> disabled should already be handled correctly.
>>
>> The problem is rather that in a couple of places (amdgpu_ctx_init for
>> example) we assume that we have at least one UVD/VCE ring.
>>
>> Alex is right that checking the fuses should be sufficient and we rather
>> need to fix the handling here instead of adding another workaround.
>
> Exactly. There are already cards out there with no UVD or VCE, so we
> need to fix this if it's a problem. It sounds like userspace is
> submitting work to the VCE or UVD rings without checking whether or
> not the device supports them in the first place. We should do a
> better job of guarding against that in the kernel.
Thanks your all.
Got that meaning now.
we may also print some message that UVD/VCE is not initialized, since it looks
initialized successfully.
```
[ 15.730219] [drm] add ip block number 7 <uvd_v6_0>
```
I could check it after the vacation(back next week).
BTW, is that handled by the patch series of [PATCH 1/6] drm/amdgpu: add VCN
JPEG support amdgpu_ctx_num_entities?
Try to apply the patches, seems amdgpu_test hang at Userptr Test, verified on
latest staging build
Please confirm that.
[ 4388.759743] BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
[ 4388.759782] IP: amddrm_sched_entity_flush+0x2d/0x1d0 [amd_sched]
[ 4388.759807] PGD 0 P4D 0
[ 4388.759820] Oops: 0000 [#1] SMP PTI
[ 4388.759834] Modules linked in: amdgpu(OE) amdchash(OE) amdttm(OE)
amd_sched(OE) amdkcl(OE) amd_iommu_v2 drm_kms_helper drm i2c_algo_bit
fb_sys_fops syscopyarea sysfillrect sysimgblt nls_utf8 cifs ccm rpcsec_gss_krb5
nfsv4 nfs fscache b
infmt_misc nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic intel_rapl
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi kvm
snd_hda_intel irqbypass crct10dif_pclmul snd_hda_codec crc32_pclmul snd_hda_co
re snd_hwdep ghash_clmulni_intel snd_seq_midi snd_seq_midi_event pcbc snd_pcm
snd_rawmidi snd_seq snd_seq_device snd_timer aesni_intel aes_x86_64 crypto_simd
eeepc_wmi glue_helper snd cryptd asus_wmi intel_cstate soundcore shpchp intel_ra
pl_perf mei_me wmi_bmof intel_wmi_thunderbolt sparse_keymap serio_raw mei
acpi_pad mac_hid sch_fq_codel
[ 4388.760141] nfsd auth_rpcgss nfs_acl parport_pc lockd ppdev grace lp sunrpc
parport ip_tables x_tables autofs4 mxm_wmi e1000e psmouse ptp pps_core ahci
libahci wmi video
[ 4388.760212] CPU: 7 PID: 915 Comm: amdgpu_test Tainted: G OE
4.15.0-39-generic #42-Ubuntu
[ 4388.760250] Hardware name: System manufacturer System Product Name/Z170-A,
BIOS 1302 11/09/2015
[ 4388.760287] RIP: 0010:amddrm_sched_entity_flush+0x2d/0x1d0 [amd_sched]
[ 4388.760314] RSP: 0018:ffffa37b8166bd38 EFLAGS: 00010246
[ 4388.760337] RAX: 0000000000000000 RBX: ffff88776740e5f8 RCX: 0000000000000000
[ 4388.760366] RDX: 0000000000000000 RSI: 00000000000000fa RDI: ffff88776740e5f8
[ 4388.760396] RBP: ffffa37b8166bd88 R08: ffff8877765dab10 R09: 0000000000000000
[ 4388.760425] R10: 0000000000000000 R11: 0000000000000064 R12: 00000000000000fa
[ 4388.760455] R13: ffff8877606fdf18 R14: ffff8877606fdef8 R15: 00000000000000fa
[ 4388.760484] FS: 00007f05b21a1580(0000) GS:ffff8877765c0000(0000)
knlGS:0000000000000000
[ 4388.760518] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4388.760542] CR2: 0000000000000008 CR3: 000000003020a005 CR4: 00000000003606e0
[ 4388.760572] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4388.760601] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4388.760630] Call Trace:
[ 4388.760644] ? wait_woken+0x80/0x80
[ 4388.760701] amdgpu_ctx_mgr_entity_flush+0x7b/0xc0 [amdgpu]
[ 4388.760747] amdgpu_flush+0x23/0x30 [amdgpu]
[ 4388.760767] filp_close+0x2f/0x80
[ 4388.760782] put_files_struct+0x78/0xf0
[ 4388.760967] exit_files+0x49/0x50
[ 4388.760976] do_exit+0x2ca/0xb40
[ 4388.760985] ? __do_page_fault+0x270/0x4d0
[ 4388.760994] do_group_exit+0x43/0xb0
[ 4388.761003] SyS_exit_group+0x14/0x20
[ 4388.761013] do_syscall_64+0x73/0x130
[ 4388.761023] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 4388.761034] RIP: 0033:0x7f05b143fe06
[ 4388.761043] RSP: 002b:00007ffd0fde5fa8 EFLAGS: 00000246 ORIG_RAX:
00000000000000e7
[ 4388.761059] RAX: ffffffffffffffda RBX: 00007f05b1742740 RCX: 00007f05b143fe06
[ 4388.761074] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[ 4388.761088] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff80
[ 4388.761103] R10: 00007f05b135a140 R11: 0000000000000246 R12: 00007f05b1742740
[ 4388.761117] R13: 0000000000000001 R14: 00007f05b174b628 R15: 0000000000000000
[ 4388.761132] Code: 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 48 89 fb 49 89
f4 48 83 ec 30 65 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 48 8b 47 10 <4c> 8b
68 08 65 48 8b 04 25 00 5c 01 00 f6 40 24 04 0f 84 1b 01
[ 4388.761188] RIP: amddrm_sched_entity_flush+0x2d/0x1d0 [amd_sched] RSP:
ffffa37b8166bd38
[ 4388.761204] CR2: 0000000000000008
[ 4388.761212] ---[ end trace 7f1dd38e3cb86992 ]---
[ 4388.761222] Fixing recursive fault but reboot is needed!
Regards,
Jerry
>
> Alex
>
>>
>> Regards,
>> Christian.
>>
>>
>> Regards,
>> Jerry
>>
>>
>> Regards,
>> Christian.
>>
>>
>> AFAIW, windows also disable UVD and VCE in initialization.
>>
>> if ((adev->pdev->device == 0x67df) &&
>> (adev->pdev->revision == 0xf7)) {
>>
>> /* Some polaris12 variants don't support UVD/VCE */
>>
>> } else {
>>
>> amdgpu_device_ip_block_add(adev, &uvd_v6_3_ip_block);
>>
>> amdgpu_device_ip_block_add(adev, &vce_v3_4_ip_block);
>>
>> }
>>
>>
>>
>> OK, will explicit the process.
>>
>> Regards,
>> Jerry
>>
>> That way if we re-arrange the order later, it will be easier to track.
>>
>>
>> Alex
>>
>> ________________________________
>> From: amd-gfx <[email protected]> on behalf of Junwei
>> Zhang <[email protected]>
>> Sent: Friday, November 23, 2018 3:32:27 AM
>> To: [email protected]
>> Cc: Zhang, Jerry
>> Subject: [PATCH] drm/amdgpu: disable UVD/VCE for some polaris 12 variants
>>
>> Some variants don't support UVD and VCE.
>>
>> Signed-off-by: Junwei Zhang <[email protected]>
>> ---
>> drivers/gpu/drm/amd/amdgpu/vi.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
>> b/drivers/gpu/drm/amd/amdgpu/vi.c
>> index f3a4cf1f013a..3338b013ded4 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>> @@ -1660,6 +1660,10 @@ int vi_set_ip_blocks(struct amdgpu_device *adev)
>> amdgpu_device_ip_block_add(adev,
>> &dce_v11_2_ip_block);
>> amdgpu_device_ip_block_add(adev, &gfx_v8_0_ip_block);
>> amdgpu_device_ip_block_add(adev, &sdma_v3_1_ip_block);
>> + /* Some polaris12 variants don't support UVD/VCE */
>> + if ((adev->pdev->device == 0x67df) &&
>> + (adev->pdev->revision == 0xf7))
>> + break;
>> amdgpu_device_ip_block_add(adev, &uvd_v6_3_ip_block);
>> amdgpu_device_ip_block_add(adev, &vce_v3_4_ip_block);
>> break;
>> --
>> 2.17.1
>>
>> _______________________________________________
>> amd-gfx mailing list
>> [email protected]
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> [email protected]
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>>
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> [email protected]
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> [email protected]
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx