Hey,

Den 2025-09-29 kl. 14:42, skrev Janusz Krzysztofik:
> Hi Christian,
> 
> Thank you for looking at it and providing your R-b.
> 
> On Sunday, 28 September 2025 16:00:57 CEST Christian König wrote:
>> On 26.09.25 17:26, Janusz Krzysztofik wrote:
>>> A timer that expires a vgem fence automatically in 10 seconds is now
>>> released with timer_delete_sync() from fence->ops.release() called on last
>>> dma_fence_put().  In some scenarios, it can run in IRQ context, which is
>>> not safe unless TIMER_IRQSAFE is used.  One potentially risky scenario was
>>> demonstrated in Intel DRM CI trybot, BAT run on machine bat-adlp-6, while
>>> working on new IGT subtests syncobj_timeline@stress-* as user space
>>> replacements of some problematic test cases of a dma-fence-chain selftest
> ...
>>> Make the timer IRQ safe.
>>>
>>> [1] https://patchwork.freedesktop.org/series/154987/#rev2
>>>
>>> Fixes: 4077798484459 ("drm/vgem: Attach sw fences to exported vGEM dma-buf 
>>> (ioctl)")
>>> Signed-off-by: Janusz Krzysztofik <[email protected]>
>>
>> Reviewed-by: Christian König <[email protected]>
>>
>> We should also consider to lower the timeout massively. This needs to be in 
>> the range of 100m-1s to not cause the same trouble as the sw_sync framework.
> 
> Assuming you are referring to potential hard lockups observed in sw_sync 
> based 
> version of the exercise, which piece of kernel code you would expect to loop 
> for too long with interrupts disabled due to the vgem fences auto expiry 
> timeout of 10s?
> 
>>
>> 10seconds is just way to long for that.
> 
> Do you think DRM objects can't be busy for that long in real life?  What 
> about 
> long running GPU compute tasks?
> 
> Thanks,
> Janusz
> 
> 
>>
>> Regards,
>> Christian.
>>
>>> ---
>>>  drivers/gpu/drm/vgem/vgem_fence.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/vgem/vgem_fence.c 
>>> b/drivers/gpu/drm/vgem/vgem_fence.c
>>> index fd76730fd38c0..07db319c3d7f9 100644
>>> --- a/drivers/gpu/drm/vgem/vgem_fence.c
>>> +++ b/drivers/gpu/drm/vgem/vgem_fence.c
>>> @@ -79,7 +79,7 @@ static struct dma_fence *vgem_fence_create(struct 
>>> vgem_file *vfile,
>>>     dma_fence_init(&fence->base, &vgem_fence_ops, &fence->lock,
>>>                    dma_fence_context_alloc(1), 1);
>>>  
>>> -   timer_setup(&fence->timer, vgem_fence_timeout, 0);
>>> +   timer_setup(&fence->timer, vgem_fence_timeout, TIMER_IRQSAFE);
>>>  
>>>     /* We force the fence to expire within 10s to prevent driver hangs */
>>>     mod_timer(&fence->timer, jiffies + VGEM_FENCE_TIMEOUT);
>>
>>
> 
> 
> 
> 
Pushed the ptach, thanks.

I'm imagining it's mostly because of potential chaining of multiple fences, 
which can worsen the worst-case timeout easily.

Kind regards,
~Maarten

Reply via email to