On Mon, Jul 07, 2025 at 08:30:02PM +0200, Christian König wrote: > On 07.07.25 20:12, Matthew Brost wrote: > >>> Then it occurs to me this looks like a common mistake to make. A little > >>> bit of git grep on dma_fence_wait_timeout() quickly finds multiple > >>> similar mistakes in drm, at least amdgpu, etnaviv, msm, and tegra. Cc > >>> some maintainers FYI. This class of bugs could cause issues elsewhere. > >> > >> I can only guess but I think all those use cases use a fixed small timeout > >> as well. IIRC amdgpu uses a timeout in the millisecond range. > >> > > > > Ah, no. A quick grep shows multiple cases in AMDGPU where long + > > MAX_SCHEDULE_TIMEOUT is used: > > > > - amdgpu_vm_cpu_update > > - amdgpu_vm_wait_idle > > - amdgpu_bo_kmap > > - amdgpu_hmm_invalidate_gfx > > - amdgpu_gem_wait_idle_ioctl > > > > I'm pretty confused by the pushback here—it's among the most confusing > > I've ever seen. > > Those cases are perfectly ok. long + MAX_SCHEDULE_TIMEOUT is correct. > > The problem only occurs when you use MAX_SCHEDULE_TIMEOUT and push the return > value into an int. >
Right. > > Again, this patch is almost certainly correct. I've made this mistake at > > least twice myself. > > The problem seems to be that i915 is using a small value, while XE is using > MAX_SCHEDULE_TIMEOUT. > > No idea why that difference is, but when you look only at the i915 code it > looks unnecessary. > Ah, ok. I try not to look at i915 code in Xe — this is about MAX_SCHEDULE_TIMEOUT. I see the confusion now. In general, though, for safety, the variable receiving the return value of dma_resv_wait_timeout() should be of type long rather than int, or things can accidentally (and silently) go sideways. Again this where a static checker or compile options would help. Matt > Regards, > Christian. > > >>> Let's also Cc Julia and Dan in case they have ideas to improve static > >>> analysis to catch this class of bugs. Or maybe one exists already, but > >>> we're just not running them! > >> > >> Yeah, agree. A script which checks if the input timeout could be >32bit > >> and the return value is assigned to something <=32bits is probably a > >> really good idea. > >> > > > > Yes, I agree. -Wstrict-overflow or -Wconversion in GCC would catch this, > > but enabling them causes the core kernel includes to quickly explode. A > > static checker would be a good solution here, or we could fix the Linux > > kernel so it compiles cleanly with these options. > > > > Matt > > > >> Regards, > >> Christian. > >> > >>> > >>> Finally, for the actual change, > >>> > >>> Reviewed-by: Jani Nikula <jani.nik...@intel.com> > >>> > >>> > >>>> Signed-off-by: Aakash Deep Sarkar <aakash.deep.sar...@intel.com> > >>>> --- > >>>> drivers/gpu/drm/i915/display/intel_display.c | 3 ++- > >>>> 1 file changed, 2 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/drivers/gpu/drm/i915/display/intel_display.c > >>>> b/drivers/gpu/drm/i915/display/intel_display.c > >>>> index 456fc4b04cda..7035c1fc9033 100644 > >>>> --- a/drivers/gpu/drm/i915/display/intel_display.c > >>>> +++ b/drivers/gpu/drm/i915/display/intel_display.c > >>>> @@ -7092,7 +7092,8 @@ static void intel_atomic_commit_fence_wait(struct > >>>> intel_atomic_state *intel_stat > >>>> struct drm_i915_private *i915 = to_i915(intel_state->base.dev); > >>>> struct drm_plane *plane; > >>>> struct drm_plane_state *new_plane_state; > >>>> - int ret, i; > >>>> + long ret; > >>>> + int i; > >>>> > >>>> for_each_new_plane_in_state(&intel_state->base, plane, > >>>> new_plane_state, i) { > >>>> if (new_plane_state->fence) { > >>> > >> >