On 07/09/17 07:24 PM, Christian König wrote: > Am 07.09.2017 um 12:14 schrieb Marek Olšák: >> On Sep 7, 2017 12:08 PM, "Christian König" <deathsim...@vodafone.de >> <mailto:deathsim...@vodafone.de>> wrote: >> Am 07.09.2017 um 11:23 schrieb Michel Dänzer: >> On 01/09/17 07:40 PM, Christian König wrote: >> Am 01.09.2017 um 12:28 schrieb Michel Dänzer: >> On 01/09/17 07:23 PM, Nicolai Hähnle wrote: >> On 01.09.2017 11:58, Michel Dänzer wrote: >> On 29/08/17 11:47 PM, Christian König wrote: >> >> From: Marek Olšák <marek.ol...@amd.com >> <mailto:marek.ol...@amd.com>> >> >> For lower overhead in the CS ioctl. >> Winsys allocators are not used with >> interprocess-sharable resources. >> >> v2: It shouldn't crash anymore, but the >> kernel will reject the new >> flag. >> v3 (christian): Rename the flag, avoid >> sending those buffers in the >> BO list. >> v4 (christian): Remove setting the kernel >> flag for now >> >> This change seems to have caused a GPU hang >> when running piglit on my >> Kaveri with the radeon kernel driver. >> >> I think we can remove "seems to have". I'm still reliably >> getting the >> GPUVM fault and hang with current master, but not if I revert this >> commit (and the one after it). >> >> Haven't been able to isolate it to a specific >> test, seems to only >> happen when running multiple tests concurrently. >> >> I reproduced the problem with piglit process separation >> enabled as well, >> and all four tests running when it hung were textureGather tests. >> Before, reproducing the problem twice with piglit process >> separation >> disabled, three textureGather tests were running when it hung >> both times >> as well. I've been unable to reproduce the problem by manually >> running >> the same textureGather tests in parallel though. >> >> >> There's a GPUVM fault before the hang, I >> suspect it's related: >> >> radeon 0000:00:01.0: GPU fault detected: 146 >> 0x0ae6760c >> radeon 0000:00:01.0: >> VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000001D7 >> radeon 0000:00:01.0: >> VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0607600C >> VM fault (0x0c, vmid 3) at page 471, read from >> 'CPF' (0x43504600) (118) >> >> >> Any ideas? >> >> Not the slightest, but I'm still investigating problems >> with that on >> amdgpu. >> >> If we can't find the root cause till Monday it might be a >> good idea to >> revert the patches for now. >> >> What's the status on that? >> >> >> >> I've found and fixed the remaining kernel bugs over the last >> weekend/beginning of this week. >> >> Still need to commit the fix for UVD/VCE, but that one shouldn't >> affect GFX at all. >> >> >> Michel is seeing hangs on the radeon KMD, which should be unaffected >> by you kernel work I think. >> >> We could revert this to unbreak Michel's Kaveri,
FWIW, there's no need to do anything for my Kaveri development system in particular; it's going out of service soon, and in the meantime I can revert these changes locally. My concern is that the underlying issue might cause other problems in real world scenarios. >> but I think it shouldn't be so difficult to find the culprit in this >> patch if there is one. > > The only crux is that the userspace patch shouldn't affect radeon at > all. So the real question is what the heck is going on here? Maybe some buffers that were previously allocated directly are now sub-allocated or re-used from the BO cache, or vice versa, or something like that? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev