They are still not pushed - I am fixing review comments.

Andrey


On 12/26/2018 01:38 AM, Lou, Wentao wrote:
> Hi Andrey,
>
> In amd-staging-dkms-4.18's merged list, I can't find 'drm/sched: Refactor 
> ring mirror list handling', neither 'drm/sched: Rework HW fence processing'.
> Now there was still much Call-Trace in new osdb triggered in 
> dma_fence_set_error. Do you have link for these patches?
> Thanks.
>
> BR,
> Wentao
>
>
> -----Original Message-----
> From: Grodzovsky, Andrey <andrey.grodzov...@amd.com>
> Sent: Saturday, December 22, 2018 12:57 AM
> To: Lou, Wentao <wentao....@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: dma_fence finished signaled by unexpected 
> callback
>
> I believe this issue would be resolved by my pending  in review patch set, 
> specifically 'drm/sched: Refactor ring mirror list handling.' since already 
> on the first TO handler it will go over all the rings including the second 
> timed out ring and will remove all call backs including the bad job cb. In 
> case by this time this bad job will signal for some reason it will be removed 
> from the mirror list already during drm_sched_process_job (take a look at 
> 'drm/sched: Rework HW fence
> processing.') and hence will not be rerun in drm_sched_job_recovery 
> (drm_sched_resubmit_jobs under the new name).
>
> Andrey
>
>
> On 12/21/2018 03:25 AM, wentalou wrote:
>> When 2 rings met timeout at same time, triggered job_timedout separately.
>> Each job_timedout called gpu_recover, but one of gpu_recover locked by 
>> another's mutex_lock.
>> Bad jod’s callback should be removed by dma_fence_remove_callback but locked 
>> inside mutex_lock.
>> So dma_fence_remove_callback could not be called immediately.
>> Then callback drm_sched_process_job triggered unexpectedly, and signaled 
>> DMA_FENCE_FLAG_SIGNALED_BIT.
>> After another's mutex_unlock, signaled bad job went through job_run inside 
>> drm_sched_job_recovery.
>> job_run would have WARN_ON and Call-Trace, when calling 
>> kcl_dma_fence_set_error for signaled bad job.
>>
>> Change-Id: I6366add13f020476882b2b8b03330a58d072dd1a
>> Signed-off-by: Wentao Lou <wentao....@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 5 ++++-
>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> index 0a17fb1..fc1d3a0 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> @@ -225,8 +225,11 @@ static struct dma_fence *amdgpu_job_run(struct
>> drm_sched_job *sched_job)
>>    
>>      trace_amdgpu_sched_run_job(job);
>>    
>> -    if (job->vram_lost_counter != 
>> atomic_read(&ring->adev->vram_lost_counter))
>> +    if (job->vram_lost_counter != 
>> atomic_read(&ring->adev->vram_lost_counter)) {
>> +            /* flags might be signaled by unexpected callback, clear it */
>> +            test_and_clear_bit(DMA_FENCE_FLAG_SIGNALED_BIT, 
>> &finished->flags);
>>              dma_fence_set_error(finished, -ECANCELED);/* skip IB as well if
>> VRAM lost */
>> +    }
>>    
>>      if (finished->error < 0) {
>>              DRM_INFO("Skip scheduling IBs!\n");

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to