Re: [Mesa-dev] [PATCH 03/15] dma-buf: Document dma-buf implicit fencing/resv fencing rules

2021-06-23 Thread Christian König

Am 22.06.21 um 18:54 schrieb Daniel Vetter:

Docs for struct dma_resv are fairly clear:

"A reservation object can have attached one exclusive fence (normally
associated with write operations) or N shared fences (read
operations)."

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdri.freedesktop.org%2Fdocs%2Fdrm%2Fdriver-api%2Fdma-buf.html%23reservation-objectsdata=04%7C01%7Cchristian.koenig%40amd.com%7C42d8c70b62084a846e9508d9359e8629%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637599777264104449%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=rcd4i0VpK3BhwFLzsxd66OhshdJJh3yhRo2MOqlCEBo%3Dreserved=0

Furthermore a review across all of upstream.

First of render drivers and how they set implicit fences:

- nouveau follows this contract, see in validate_fini_no_ticket()

nouveau_bo_fence(nvbo, fence, !!b->write_domains);

   and that last boolean controls whether the exclusive or shared fence
   slot is used.

- radeon follows this contract by setting

p->relocs[i].tv.num_shared = !r->write_domain;

   in radeon_cs_parser_relocs(), which ensures that the call to
   ttm_eu_fence_buffer_objects() in radeon_cs_parser_fini() will do the
   right thing.

- vmwgfx seems to follow this contract with the shotgun approach of
   always setting ttm_val_buf->num_shared = 0, which means
   ttm_eu_fence_buffer_objects() will only use the exclusive slot.

- etnaviv follows this contract, as can be trivially seen by looking
   at submit_attach_object_fences()

- i915 is a bit a convoluted maze with multiple paths leading to
   i915_vma_move_to_active(). Which sets the exclusive flag if
   EXEC_OBJECT_WRITE is set. This can either come as a buffer flag for
   softpin mode, or through the write_domain when using relocations. It
   follows this contract.

- lima follows this contract, see lima_gem_submit() which sets the
   exclusive fence when the LIMA_SUBMIT_BO_WRITE flag is set for that
   bo

- msm follows this contract, see msm_gpu_submit() which sets the
   exclusive flag when the MSM_SUBMIT_BO_WRITE is set for that buffer

- panfrost follows this contract with the shotgun approach of just
   always setting the exclusive fence, see
   panfrost_attach_object_fences(). Benefits of a single engine I guess

- v3d follows this contract with the same shotgun approach in
   v3d_attach_fences_and_unlock_reservation(), but it has at least an
   XXX comment that maybe this should be improved

- v4c uses the same shotgun approach of always setting an exclusive
   fence, see vc4_update_bo_seqnos()

- vgem also follows this contract, see vgem_fence_attach_ioctl() and
   the VGEM_FENCE_WRITE. This is used in some igts to validate prime
   sharing with i915.ko without the need of a 2nd gpu

- vritio follows this contract again with the shotgun approach of
   always setting an exclusive fence, see virtio_gpu_array_add_fence()

This covers the setting of the exclusive fences when writing.

Synchronizing against the exclusive fence is a lot more tricky, and I
only spot checked a few:

- i915 does it, with the optional EXEC_OBJECT_ASYNC to skip all
   implicit dependencies (which is used by vulkan)

- etnaviv does this. Implicit dependencies are collected in
   submit_fence_sync(), again with an opt-out flag
   ETNA_SUBMIT_NO_IMPLICIT. These are then picked up in
   etnaviv_sched_dependency which is the
   drm_sched_backend_ops->dependency callback.

- v4c seems to not do much here, maybe gets away with it by not having
   a scheduler and only a single engine. Since all newer broadcom chips than
   the OG vc4 use v3d for rendering, which follows this contract, the
   impact of this issue is fairly small.

- v3d does this using the drm_gem_fence_array_add_implicit() helper,
   which then it's drm_sched_backend_ops->dependency callback
   v3d_job_dependency() picks up.

- panfrost is nice here and tracks the implicit fences in
   panfrost_job->implicit_fences, which again the
   drm_sched_backend_ops->dependency callback panfrost_job_dependency()
   picks up. It is mildly questionable though since it only picks up
   exclusive fences in panfrost_acquire_object_fences(), but not buggy
   in practice because it also always sets the exclusive fence. It
   should pick up both sets of fences, just in case there's ever going
   to be a 2nd gpu in a SoC with a mali gpu. Or maybe a mali SoC with a
   pcie port and a real gpu, which might actually happen eventually. A
   bug, but easy to fix. Should probably use the
   drm_gem_fence_array_add_implicit() helper.

- lima is nice an easy, uses drm_gem_fence_array_add_implicit() and
   the same schema as v3d.

- msm is mildly entertaining. It also supports MSM_SUBMIT_NO_IMPLICIT,
   but because it doesn't use the drm/scheduler it handles fences from
   the wrong context with a synchronous dma_fence_wait. See
   submit_fence_sync() leading to msm_gem_sync_object(). Investing into
   a 

[Mesa-dev] [PATCH 03/15] dma-buf: Document dma-buf implicit fencing/resv fencing rules

2021-06-22 Thread Daniel Vetter
Docs for struct dma_resv are fairly clear:

"A reservation object can have attached one exclusive fence (normally
associated with write operations) or N shared fences (read
operations)."

https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#reservation-objects

Furthermore a review across all of upstream.

First of render drivers and how they set implicit fences:

- nouveau follows this contract, see in validate_fini_no_ticket()

nouveau_bo_fence(nvbo, fence, !!b->write_domains);

  and that last boolean controls whether the exclusive or shared fence
  slot is used.

- radeon follows this contract by setting

p->relocs[i].tv.num_shared = !r->write_domain;

  in radeon_cs_parser_relocs(), which ensures that the call to
  ttm_eu_fence_buffer_objects() in radeon_cs_parser_fini() will do the
  right thing.

- vmwgfx seems to follow this contract with the shotgun approach of
  always setting ttm_val_buf->num_shared = 0, which means
  ttm_eu_fence_buffer_objects() will only use the exclusive slot.

- etnaviv follows this contract, as can be trivially seen by looking
  at submit_attach_object_fences()

- i915 is a bit a convoluted maze with multiple paths leading to
  i915_vma_move_to_active(). Which sets the exclusive flag if
  EXEC_OBJECT_WRITE is set. This can either come as a buffer flag for
  softpin mode, or through the write_domain when using relocations. It
  follows this contract.

- lima follows this contract, see lima_gem_submit() which sets the
  exclusive fence when the LIMA_SUBMIT_BO_WRITE flag is set for that
  bo

- msm follows this contract, see msm_gpu_submit() which sets the
  exclusive flag when the MSM_SUBMIT_BO_WRITE is set for that buffer

- panfrost follows this contract with the shotgun approach of just
  always setting the exclusive fence, see
  panfrost_attach_object_fences(). Benefits of a single engine I guess

- v3d follows this contract with the same shotgun approach in
  v3d_attach_fences_and_unlock_reservation(), but it has at least an
  XXX comment that maybe this should be improved

- v4c uses the same shotgun approach of always setting an exclusive
  fence, see vc4_update_bo_seqnos()

- vgem also follows this contract, see vgem_fence_attach_ioctl() and
  the VGEM_FENCE_WRITE. This is used in some igts to validate prime
  sharing with i915.ko without the need of a 2nd gpu

- vritio follows this contract again with the shotgun approach of
  always setting an exclusive fence, see virtio_gpu_array_add_fence()

This covers the setting of the exclusive fences when writing.

Synchronizing against the exclusive fence is a lot more tricky, and I
only spot checked a few:

- i915 does it, with the optional EXEC_OBJECT_ASYNC to skip all
  implicit dependencies (which is used by vulkan)

- etnaviv does this. Implicit dependencies are collected in
  submit_fence_sync(), again with an opt-out flag
  ETNA_SUBMIT_NO_IMPLICIT. These are then picked up in
  etnaviv_sched_dependency which is the
  drm_sched_backend_ops->dependency callback.

- v4c seems to not do much here, maybe gets away with it by not having
  a scheduler and only a single engine. Since all newer broadcom chips than
  the OG vc4 use v3d for rendering, which follows this contract, the
  impact of this issue is fairly small.

- v3d does this using the drm_gem_fence_array_add_implicit() helper,
  which then it's drm_sched_backend_ops->dependency callback
  v3d_job_dependency() picks up.

- panfrost is nice here and tracks the implicit fences in
  panfrost_job->implicit_fences, which again the
  drm_sched_backend_ops->dependency callback panfrost_job_dependency()
  picks up. It is mildly questionable though since it only picks up
  exclusive fences in panfrost_acquire_object_fences(), but not buggy
  in practice because it also always sets the exclusive fence. It
  should pick up both sets of fences, just in case there's ever going
  to be a 2nd gpu in a SoC with a mali gpu. Or maybe a mali SoC with a
  pcie port and a real gpu, which might actually happen eventually. A
  bug, but easy to fix. Should probably use the
  drm_gem_fence_array_add_implicit() helper.

- lima is nice an easy, uses drm_gem_fence_array_add_implicit() and
  the same schema as v3d.

- msm is mildly entertaining. It also supports MSM_SUBMIT_NO_IMPLICIT,
  but because it doesn't use the drm/scheduler it handles fences from
  the wrong context with a synchronous dma_fence_wait. See
  submit_fence_sync() leading to msm_gem_sync_object(). Investing into
  a scheduler might be a good idea.

- all the remaining drivers are ttm based, where I hope they do
  appropriately obey implicit fences already. I didn't do the full
  audit there because a) not follow the contract would confuse ttm
  quite well and b) reading non-standard scheduler and submit code
  which isn't based on drm/scheduler is a pain.

Onwards to the display side.

- Any driver using the drm_gem_plane_helper_prepare_fb() helper will
  correctly.