On 6/3/25 13:30, Tvrtko Ursulin wrote: > > On 02/06/2025 19:00, Christian König wrote: >> On 6/2/25 17:25, Tvrtko Ursulin wrote: >>> >>> On 02/06/2025 15:42, Christian König wrote: >>>> On 6/2/25 15:05, Tvrtko Ursulin wrote: >>>>> >>>>> Hi, >>>>> >>>>> On 15/05/2025 14:15, Christian König wrote: >>>>>> Hey drm-misc maintainers, >>>>>> >>>>>> can you guys please backmerge drm-next into drm-misc-next? >>>>>> >>>>>> I want to push this patch here but it depends on changes which are >>>>>> partially in drm-next and partially in drm-misc-next. >>>>> >>>>> Looks like the backmerge is still pending? >>>> >>>> Yes, @Maarten, @Maxime and @Thomas ping on this. >>>> >>>>> In the meantime, Christian, any chance you will have some bandwith to >>>>> think about the tail end of the series? Specifically patch 6 and how that >>>>> is used onward. >>>> >>>> Well the RCU grace period is quite a nifty hack. I wanted to go over it >>>> again after merging the first patches from this series. >>>> >>>> In general looks like a good idea to me, I just don't like that we >>>> explicitely need to expose dma_fence_access_begin() and >>>> dma_fence_access_end(). >>>> >>>> Especially we can't do that while calling fence->ops->release. >>> >>> Hm why not? You think something will take offence of the rcu_read_lock()? >> >> Yes, especially it is perfectly legitimate to call synchronize_rcu() or lock >> semaphores/mutexes from that callback. >> >> Either keep the RCU critical section only for the trace or even better come >> up with some different approach, e.g. copying the string under the RCU lock >> or something like that. > > Hmm but the kerneldoc explicity says callback can be called from irq context: > > /** > * @release: > * > * Called on destruction of fence to release additional resources. > * Can be called from irq context. This callback is optional. If it is > * NULL, then dma_fence_free() is instead called as the default > * implementation. > */ > void (*release)(struct dma_fence *fence);
Ah, right. I mixed that up with the dma-buf object. Yeah in that case that is probably harmless. We delegate the final free to a work item if necessary anyway. But I would still like to avoid having the RCU cover the release as well. Or why is there any reason why we would explicitely want to do this? Regards, Christian. > > > Regards, > > Tvrtko > >> >> Regards, >> Christian. >> >>> >>> Regards, >>> >>> Tvrtko >>> >>>>>> On 5/15/25 11:49, Tvrtko Ursulin wrote: >>>>>>> With the goal of reducing the need for drivers to touch (and >>>>>>> dereference) >>>>>>> fence->ops, we move the 64-bit seqnos flag from struct dma_fence_ops to >>>>>>> the fence->flags. >>>>>>> >>>>>>> Drivers which were setting this flag are changed to use new >>>>>>> dma_fence_init64() instead of dma_fence_init(). >>>>>>> >>>>>>> v2: >>>>>>> * Streamlined init and added kerneldoc. >>>>>>> * Rebase for amdgpu userq which landed since. >>>>>>> >>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursu...@igalia.com> >>>>>>> Reviewed-by: Christian König <christian.koe...@amd.com> # v1 >>>>>>> --- >>>>>>> drivers/dma-buf/dma-fence-chain.c | 5 +- >>>>>>> drivers/dma-buf/dma-fence.c | 69 >>>>>>> ++++++++++++++----- >>>>>>> .../drm/amd/amdgpu/amdgpu_eviction_fence.c | 7 +- >>>>>>> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 5 +- >>>>>>> .../gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c | 5 +- >>>>>>> include/linux/dma-fence.h | 14 ++-- >>>>>>> 6 files changed, 64 insertions(+), 41 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/dma-buf/dma-fence-chain.c >>>>>>> b/drivers/dma-buf/dma-fence-chain.c >>>>>>> index 90424f23fd73..a8a90acf4f34 100644 >>>>>>> --- a/drivers/dma-buf/dma-fence-chain.c >>>>>>> +++ b/drivers/dma-buf/dma-fence-chain.c >>>>>>> @@ -218,7 +218,6 @@ static void dma_fence_chain_set_deadline(struct >>>>>>> dma_fence *fence, >>>>>>> } >>>>>>> const struct dma_fence_ops dma_fence_chain_ops = { >>>>>>> - .use_64bit_seqno = true, >>>>>>> .get_driver_name = dma_fence_chain_get_driver_name, >>>>>>> .get_timeline_name = dma_fence_chain_get_timeline_name, >>>>>>> .enable_signaling = dma_fence_chain_enable_signaling, >>>>>>> @@ -262,8 +261,8 @@ void dma_fence_chain_init(struct dma_fence_chain >>>>>>> *chain, >>>>>>> seqno = max(prev->seqno, seqno); >>>>>>> } >>>>>>> - dma_fence_init(&chain->base, &dma_fence_chain_ops, >>>>>>> - &chain->lock, context, seqno); >>>>>>> + dma_fence_init64(&chain->base, &dma_fence_chain_ops, &chain->lock, >>>>>>> + context, seqno); >>>>>>> /* >>>>>>> * Chaining dma_fence_chain container together is only allowed >>>>>>> through >>>>>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c >>>>>>> index f0cdd3e99d36..705b59787731 100644 >>>>>>> --- a/drivers/dma-buf/dma-fence.c >>>>>>> +++ b/drivers/dma-buf/dma-fence.c >>>>>>> @@ -989,24 +989,9 @@ void dma_fence_describe(struct dma_fence *fence, >>>>>>> struct seq_file *seq) >>>>>>> } >>>>>>> EXPORT_SYMBOL(dma_fence_describe); >>>>>>> -/** >>>>>>> - * dma_fence_init - Initialize a custom fence. >>>>>>> - * @fence: the fence to initialize >>>>>>> - * @ops: the dma_fence_ops for operations on this fence >>>>>>> - * @lock: the irqsafe spinlock to use for locking this fence >>>>>>> - * @context: the execution context this fence is run on >>>>>>> - * @seqno: a linear increasing sequence number for this context >>>>>>> - * >>>>>>> - * Initializes an allocated fence, the caller doesn't have to keep its >>>>>>> - * refcount after committing with this fence, but it will need to hold >>>>>>> a >>>>>>> - * refcount again if &dma_fence_ops.enable_signaling gets called. >>>>>>> - * >>>>>>> - * context and seqno are used for easy comparison between fences, >>>>>>> allowing >>>>>>> - * to check which fence is later by simply using dma_fence_later(). >>>>>>> - */ >>>>>>> -void >>>>>>> -dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops >>>>>>> *ops, >>>>>>> - spinlock_t *lock, u64 context, u64 seqno) >>>>>>> +static void >>>>>>> +__dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops >>>>>>> *ops, >>>>>>> + spinlock_t *lock, u64 context, u64 seqno, unsigned long >>>>>>> flags) >>>>>>> { >>>>>>> BUG_ON(!lock); >>>>>>> BUG_ON(!ops || !ops->get_driver_name || >>>>>>> !ops->get_timeline_name); >>>>>>> @@ -1017,9 +1002,55 @@ dma_fence_init(struct dma_fence *fence, const >>>>>>> struct dma_fence_ops *ops, >>>>>>> fence->lock = lock; >>>>>>> fence->context = context; >>>>>>> fence->seqno = seqno; >>>>>>> - fence->flags = 0UL; >>>>>>> + fence->flags = flags; >>>>>>> fence->error = 0; >>>>>>> trace_dma_fence_init(fence); >>>>>>> } >>>>>>> + >>>>>>> +/** >>>>>>> + * dma_fence_init - Initialize a custom fence. >>>>>>> + * @fence: the fence to initialize >>>>>>> + * @ops: the dma_fence_ops for operations on this fence >>>>>>> + * @lock: the irqsafe spinlock to use for locking this fence >>>>>>> + * @context: the execution context this fence is run on >>>>>>> + * @seqno: a linear increasing sequence number for this context >>>>>>> + * >>>>>>> + * Initializes an allocated fence, the caller doesn't have to keep its >>>>>>> + * refcount after committing with this fence, but it will need to hold >>>>>>> a >>>>>>> + * refcount again if &dma_fence_ops.enable_signaling gets called. >>>>>>> + * >>>>>>> + * context and seqno are used for easy comparison between fences, >>>>>>> allowing >>>>>>> + * to check which fence is later by simply using dma_fence_later(). >>>>>>> + */ >>>>>>> +void >>>>>>> +dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops >>>>>>> *ops, >>>>>>> + spinlock_t *lock, u64 context, u64 seqno) >>>>>>> +{ >>>>>>> + __dma_fence_init(fence, ops, lock, context, seqno, 0UL); >>>>>>> +} >>>>>>> EXPORT_SYMBOL(dma_fence_init); >>>>>>> + >>>>>>> +/** >>>>>>> + * dma_fence_init64 - Initialize a custom fence with 64-bit seqno >>>>>>> support. >>>>>>> + * @fence: the fence to initialize >>>>>>> + * @ops: the dma_fence_ops for operations on this fence >>>>>>> + * @lock: the irqsafe spinlock to use for locking this fence >>>>>>> + * @context: the execution context this fence is run on >>>>>>> + * @seqno: a linear increasing sequence number for this context >>>>>>> + * >>>>>>> + * Initializes an allocated fence, the caller doesn't have to keep its >>>>>>> + * refcount after committing with this fence, but it will need to hold >>>>>>> a >>>>>>> + * refcount again if &dma_fence_ops.enable_signaling gets called. >>>>>>> + * >>>>>>> + * Context and seqno are used for easy comparison between fences, >>>>>>> allowing >>>>>>> + * to check which fence is later by simply using dma_fence_later(). >>>>>>> + */ >>>>>>> +void >>>>>>> +dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops >>>>>>> *ops, >>>>>>> + spinlock_t *lock, u64 context, u64 seqno) >>>>>>> +{ >>>>>>> + __dma_fence_init(fence, ops, lock, context, seqno, >>>>>>> + BIT(DMA_FENCE_FLAG_SEQNO64_BIT)); >>>>>>> +} >>>>>>> +EXPORT_SYMBOL(dma_fence_init64); >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c >>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c >>>>>>> index 1a7469543db5..79713421bffe 100644 >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c >>>>>>> @@ -134,7 +134,6 @@ static bool >>>>>>> amdgpu_eviction_fence_enable_signaling(struct dma_fence *f) >>>>>>> } >>>>>>> static const struct dma_fence_ops amdgpu_eviction_fence_ops = { >>>>>>> - .use_64bit_seqno = true, >>>>>>> .get_driver_name = amdgpu_eviction_fence_get_driver_name, >>>>>>> .get_timeline_name = amdgpu_eviction_fence_get_timeline_name, >>>>>>> .enable_signaling = amdgpu_eviction_fence_enable_signaling, >>>>>>> @@ -160,9 +159,9 @@ amdgpu_eviction_fence_create(struct >>>>>>> amdgpu_eviction_fence_mgr *evf_mgr) >>>>>>> ev_fence->evf_mgr = evf_mgr; >>>>>>> get_task_comm(ev_fence->timeline_name, current); >>>>>>> spin_lock_init(&ev_fence->lock); >>>>>>> - dma_fence_init(&ev_fence->base, &amdgpu_eviction_fence_ops, >>>>>>> - &ev_fence->lock, evf_mgr->ev_fence_ctx, >>>>>>> - atomic_inc_return(&evf_mgr->ev_fence_seq)); >>>>>>> + dma_fence_init64(&ev_fence->base, &amdgpu_eviction_fence_ops, >>>>>>> + &ev_fence->lock, evf_mgr->ev_fence_ctx, >>>>>>> + atomic_inc_return(&evf_mgr->ev_fence_seq)); >>>>>>> return ev_fence; >>>>>>> } >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c >>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c >>>>>>> index 029cb24c28b3..5e92d00a591f 100644 >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c >>>>>>> @@ -239,8 +239,8 @@ static int amdgpu_userq_fence_create(struct >>>>>>> amdgpu_usermode_queue *userq, >>>>>>> fence = &userq_fence->base; >>>>>>> userq_fence->fence_drv = fence_drv; >>>>>>> - dma_fence_init(fence, &amdgpu_userq_fence_ops, >>>>>>> &userq_fence->lock, >>>>>>> - fence_drv->context, seq); >>>>>>> + dma_fence_init64(fence, &amdgpu_userq_fence_ops, >>>>>>> &userq_fence->lock, >>>>>>> + fence_drv->context, seq); >>>>>>> amdgpu_userq_fence_driver_get(fence_drv); >>>>>>> dma_fence_get(fence); >>>>>>> @@ -334,7 +334,6 @@ static void amdgpu_userq_fence_release(struct >>>>>>> dma_fence *f) >>>>>>> } >>>>>>> static const struct dma_fence_ops amdgpu_userq_fence_ops = { >>>>>>> - .use_64bit_seqno = true, >>>>>>> .get_driver_name = amdgpu_userq_fence_get_driver_name, >>>>>>> .get_timeline_name = amdgpu_userq_fence_get_timeline_name, >>>>>>> .signaled = amdgpu_userq_fence_signaled, >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c >>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c >>>>>>> index 51cddfa3f1e8..5d26797356a3 100644 >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c >>>>>>> @@ -71,7 +71,6 @@ static void amdgpu_tlb_fence_work(struct work_struct >>>>>>> *work) >>>>>>> } >>>>>>> static const struct dma_fence_ops amdgpu_tlb_fence_ops = { >>>>>>> - .use_64bit_seqno = true, >>>>>>> .get_driver_name = amdgpu_tlb_fence_get_driver_name, >>>>>>> .get_timeline_name = amdgpu_tlb_fence_get_timeline_name >>>>>>> }; >>>>>>> @@ -101,8 +100,8 @@ void amdgpu_vm_tlb_fence_create(struct >>>>>>> amdgpu_device *adev, struct amdgpu_vm *vm >>>>>>> INIT_WORK(&f->work, amdgpu_tlb_fence_work); >>>>>>> spin_lock_init(&f->lock); >>>>>>> - dma_fence_init(&f->base, &amdgpu_tlb_fence_ops, &f->lock, >>>>>>> - vm->tlb_fence_context, atomic64_read(&vm->tlb_seq)); >>>>>>> + dma_fence_init64(&f->base, &amdgpu_tlb_fence_ops, &f->lock, >>>>>>> + vm->tlb_fence_context, atomic64_read(&vm->tlb_seq)); >>>>>>> /* TODO: We probably need a separate wq here */ >>>>>>> dma_fence_get(&f->base); >>>>>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h >>>>>>> index 48b5202c531d..a34a0dcdc446 100644 >>>>>>> --- a/include/linux/dma-fence.h >>>>>>> +++ b/include/linux/dma-fence.h >>>>>>> @@ -97,6 +97,7 @@ struct dma_fence { >>>>>>> }; >>>>>>> enum dma_fence_flag_bits { >>>>>>> + DMA_FENCE_FLAG_SEQNO64_BIT, >>>>>>> DMA_FENCE_FLAG_SIGNALED_BIT, >>>>>>> DMA_FENCE_FLAG_TIMESTAMP_BIT, >>>>>>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, >>>>>>> @@ -124,14 +125,6 @@ struct dma_fence_cb { >>>>>>> * >>>>>>> */ >>>>>>> struct dma_fence_ops { >>>>>>> - /** >>>>>>> - * @use_64bit_seqno: >>>>>>> - * >>>>>>> - * True if this dma_fence implementation uses 64bit seqno, false >>>>>>> - * otherwise. >>>>>>> - */ >>>>>>> - bool use_64bit_seqno; >>>>>>> - >>>>>>> /** >>>>>>> * @get_driver_name: >>>>>>> * >>>>>>> @@ -262,6 +255,9 @@ struct dma_fence_ops { >>>>>>> void dma_fence_init(struct dma_fence *fence, const struct >>>>>>> dma_fence_ops *ops, >>>>>>> spinlock_t *lock, u64 context, u64 seqno); >>>>>>> +void dma_fence_init64(struct dma_fence *fence, const struct >>>>>>> dma_fence_ops *ops, >>>>>>> + spinlock_t *lock, u64 context, u64 seqno); >>>>>>> + >>>>>>> void dma_fence_release(struct kref *kref); >>>>>>> void dma_fence_free(struct dma_fence *fence); >>>>>>> void dma_fence_describe(struct dma_fence *fence, struct seq_file >>>>>>> *seq); >>>>>>> @@ -454,7 +450,7 @@ static inline bool __dma_fence_is_later(struct >>>>>>> dma_fence *fence, u64 f1, u64 f2) >>>>>>> * 32bit sequence numbers. Use a 64bit compare when the driver >>>>>>> says to >>>>>>> * do so. >>>>>>> */ >>>>>>> - if (fence->ops->use_64bit_seqno) >>>>>>> + if (test_bit(DMA_FENCE_FLAG_SEQNO64_BIT, &fence->flags)) >>>>>>> return f1 > f2; >>>>>>> return (int)(lower_32_bits(f1) - lower_32_bits(f2)) > 0; >>>>>> >>>>> >>>> >>> >> >