Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

zhoucm1 Tue, 04 Sep 2018 00:54:27 -0700


On 2018年09月04日 15:00, Christian König wrote:

Am 04.09.2018 um 06:04 schrieb zhoucm1:
On 2018年09月03日 19:19, Christian König wrote:
Am 03.09.2018 um 12:07 schrieb Chunming Zhou:
在 2018/9/3 16:50, Christian König 写道:
Am 03.09.2018 um 06:13 schrieb Chunming Zhou:
在 2018/8/30 19:32, Christian König 写道:
[SNIP]
+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};
What are those two structures good for
They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
a. signal pt increase timeline value to signal_pt valuewhen signal_pt is signaled. signal_pt is depending onprevious pt fence and itself signal fence from signal ops.
    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
a. signal pt increase 1 forsyncobj_timeline->signal_point every time when signal ops isperformed. b. when wait ops is comming, wait pt will fetch abovelast signal pt value as its wait point. wait pt will be onlysignaled by equal point value signal_pt.
and why is the stub fence their base?
Good question, I tried to kzalloc for them as well when Idebug them, I encountered a problem:I lookup/find wait_pt or signal_pt successfully, but when Itried to use them, they are freed sometimes, and results inNULL point.and generally, when lookup them, we often need their stubfence as well, and in the meantime, their lifecycle are same.
Above reason, I make stub fence as their base.
That sounds like you only did this because you messed up thelifecycle.
Additional to that I don't think you correctly considered thelifecycle of the waits and the sync object itself. E.g.blocking in drm_syncobj_timeline_fini() until all waits aredone is not a good idea.
What you should do instead is to create a fence_array objectwith all the fence we need to wait for when a wait point isrequested.
Yeah, this is our initial discussion result, but when I triedto do that, I found that cannot meet the advance signalrequirement: a. We need to consider the wait and signal pt value are notone-to-one match, it's difficult to find dependent point, atleast, there is some overhead.
As far as I can see that is independent of using a fence arrayhere. See you can either use a ring buffer or an rb-tree, butwhen you want to wait for a specific point we need to condensethe not yet signaled fences into an array.
again, need to find the range of where the specific point in, weshould close to timeline semantics, I also refered the sw_sync.ctimeline, usally wait_pt is signalled by timeline point. And Iagree we can implement it with several methods, but I don't thinkthere are basical differences.
The problem is that with your current approach you need thesync_obj alive for the synchronization to work. That is mostlikely not a good idea.
Indeed, I will fix that. How abount only creating fence array forevery wait pt when syncobj release? when syncobj release, wait ptmust have waited the signal opertation, then we can easily condensefences for every wait pt. And meantime, we can take timeline basedwait pt advantage.
That could work, but I don't see how you want to replace the alreadyissued fence with a fence_array when the sync object is destroyed.
Additional to that I would rather prefer a consistent handling, e.g.without extra rarely used code paths.
Ah, I find a easy way, we just need to make syncobj_timelinestructure as a reference. This way syncobj itself can be releasedfirst, wait_pt/signal_pt don't need syncobj at all.
every wait_pt/signal_pt keep a reference of syncobj_timeline.
I've thought about that as well, but came to the conclusion that yourun into problems because of circle dependencies.
E.g. sync_obj references sync_point and sync_point references sync_obj.

sync_obj can be freed first, only sync point references syncobj_timelinestructure, syncobj_timeline doesn't reference sync_pt, no circle dep.

Additional to that it is quite a bit larger memory footprint becauseyou need to keep the sync_obj around as well.

all signaled sync_pt are freed immediately except syncobj_timelinesturcture, where does extra memory foootprint take?

Additional to that you enable signaling without a need from thewaiting side. That is rather bad for implementations which needthat optimization.
Do you mean increasing timeline based on signal fence is notbetter? only update timeline value when requested by a wait pt?
Yes, exactly.
This way, we will not update timeline value immidiately and cannotfree signal pt immidiately, and we also need to consider it to CPUquery and wait.
That is actually the better coding style. We usually try to avoiddoing things in interrupt handlers as much as possible.
OK, I see your concern, how about to delay handling to a workqueue?this way, we only increase timeline value and wake up workqueue infence cb, is that acceptable?
A little bit, but not much better. The nice thing with garbagecollection is that the CPU time is accounted to the process causingthe garbage.
How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted node.
3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage collection andremove the first node from the tree as long as it is signaled.
5. When enable_signaling is requested for a node we cascade that tothe left using rb_prev. This ensures that signaling is enabled for the current fence aswell as all previous fences.
6. A wait just looks into the tree for the signal point lower orequal of the requested sequence number.

After re-thought your idea, I think it doesn't work since there is notimeline value as a line:signal pt value doesn't must be continues, which can be jump by signaloperation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt,signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a waitpt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value isbigger than wait pt value, that means signal_pt8 is need for wait_pt7.

7. When the sync object is released we userbtree_postorder_for_each_entry_safe() and drop the extra referenceto each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g.the sync object). It only destructs itself when the looked upreferences to the nodes are dropped.

And here, who will destroy rb node since no one do enable_signaling, andthere is no callback to free themselves.


Regards,
David Zhou

Well that is quite a bunch of logic, but I think that should work fine.
Yeah, it could work, simple timeline reference also can solve 'free'problem.
I think this approach is still quite a bit better,

e.g. you don't run into circle dependency problems, it needs lessmemory and each node has always the same size which means we can use akmem_cache for it.
Regards,
Christian.
Thanks,
David Zhou


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

Reply via email to