On Tue, 2025-11-25 at 09:03 +0100, Christian König wrote:
> On 11/25/25 08:55, Philipp Stanner wrote:
> > >  
> > > +/**
> > > + * define DMA_FENCE_MAX_REASONABLE_TIMEOUT - max reasonable signaling 
> > > timeout
> > > + *
> > > + * The dma_fence object has a deep inter dependency with core memory
> > > + * management, for a detailed explanation see section DMA Fences under
> > > + * Documentation/driver-api/dma-buf.rst.
> > > + *
> > > + * Because of this all dma_fence implementations must guarantee that 
> > > each fence
> > > + * completes in a finite time. This define here now gives a reasonable 
> > > value for
> > > + * the timeout to use. It is possible to use a longer timeout in an
> > > + * implementation but that should taint the kernel.
> > > + */
> > > +#define DMA_FENCE_MAX_REASONABLE_TIMEOUT (2*HZ)
> > 
> > HZ can change depending on the config. Is that really a good choice? I
> > could see racy situations arising in some configs vs others
> 
> 2*HZ is always two seconds expressed in number of jiffies, I can use 
> msecs_to_jiffies(2000) to make that more obvious.

On AMD64 maybe. What about the other architectures?

> 
> The GPU scheduler has a very similar define, MAX_WAIT_SCHED_ENTITY_Q_EMPTY 
> which is currently just 1 second.
> 
> The real question is what is the maximum amount of time we can wait for the 
> HW before we should trigger a timeout?

That's a question only the drivers can answer, which is why I like to
think that setting global constants constraining all parties is not the
right thing to do.

What is even your motivation? What problem does this solve? Is the OOM
killer currently hanging for anyone? Can you link a bug report?

> 
> Some AMD internal team is pushing for 10 seconds, but that also means that 
> for example we wait 10 seconds for the OOM killer to do something. That 
> sounds like way to long.
> 

Nouveau has timeout = 10 seconds. AFAIK we've never seen bugs because
of that. Have you seen some?


P.

Reply via email to