On 11/25/25 08:55, Philipp Stanner wrote:
> On Thu, 2025-11-20 at 15:41 +0100, Christian König wrote:
>> Add a define implementations can use as reasonable maximum signaling
>> timeout. Document that if this timeout is exceeded by config options
>> implementations should taint the kernel.
>>
>> Tainting the kernel is important for bug reports to detect that end
>> users might be using a problematic configuration.
>>
>> Signed-off-by: Christian König <[email protected]>
>> ---
>>  include/linux/dma-fence.h | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>> index 64639e104110..b31dfa501c84 100644
>> --- a/include/linux/dma-fence.h
>> +++ b/include/linux/dma-fence.h
>> @@ -28,6 +28,20 @@ struct dma_fence_ops;
>>  struct dma_fence_cb;
>>  struct seq_file;
>>  
>> +/**
>> + * define DMA_FENCE_MAX_REASONABLE_TIMEOUT - max reasonable signaling 
>> timeout
>> + *
>> + * The dma_fence object has a deep inter dependency with core memory
>> + * management, for a detailed explanation see section DMA Fences under
>> + * Documentation/driver-api/dma-buf.rst.
>> + *
>> + * Because of this all dma_fence implementations must guarantee that each 
>> fence
>> + * completes in a finite time. This define here now gives a reasonable 
>> value for
>> + * the timeout to use. It is possible to use a longer timeout in an
>> + * implementation but that should taint the kernel.
>> + */
>> +#define DMA_FENCE_MAX_REASONABLE_TIMEOUT (2*HZ)
> 
> HZ can change depending on the config. Is that really a good choice? I
> could see racy situations arising in some configs vs others

2*HZ is always two seconds expressed in number of jiffies, I can use 
msecs_to_jiffies(2000) to make that more obvious.

The GPU scheduler has a very similar define, MAX_WAIT_SCHED_ENTITY_Q_EMPTY 
which is currently just 1 second.

The real question is what is the maximum amount of time we can wait for the HW 
before we should trigger a timeout?

Some AMD internal team is pushing for 10 seconds, but that also means that for 
example we wait 10 seconds for the OOM killer to do something. That sounds like 
way to long.

Regards,
Christian.

> 
> P.

Reply via email to