Re: [E-devel] Callback arrays and callback invocation optimisations

Tom Hacohen Fri, 26 Aug 2016 02:50:11 -0700

On 24/08/16 20:03, Cedric BAIL wrote:
> On Wed, Aug 24, 2016 at 2:24 AM, Tom Hacohen <t...@osg.samsung.com> wrote:
>> On 23/08/16 18:51, Cedric BAIL wrote:
>>> On Tue, Aug 23, 2016 at 3:31 AM, Tom Hacohen <t...@osg.samsung.com> wrote:
>
> <snip>
>
>>>> However, while they provide a nice memory improvement, they have been
>>>> hampering many optimisation strategies that would make callback
>>>> invocation significantly faster. Furthermore, maybe (not sure), we can
>>>> automatically de-duplicate event lists internally (more on that in a
>>>> moment). With that being said, there is a way we can maybe keep array
>>>> callbacks with some limitations.
>>>
>>> Do you have a case where performance are impacted by callback today ?
>>> I have found that we usually have a very small number of callbacks
>>> (likely in an array this days) and when speed did really matter it was
>>> just best to not trigger the callback at all (That's why we have this
>>> code in many place that count if any callback has been registered).
>>
>> It always showed up in callgrind. Obviously after you did your changes
>> that improved things, because you essentially just don't call that code,
>> but having to do this everywhere is a bit of a pain, especially if we
>> can just make callbacks fast on their own.
>>
>> Callback_call takes around 1.5% in the efl atm. Though if we remove the
>> not-call optimisations it would be much more again. I wonder if we can
>> reach good results without it.
>
> When genlist is scrolling, just calling a function is costly as we end
> up calling it million times, litterally. I seriously doubt it is
> possible.
>


And yet, this is one of the functions that stand out and not others that 
are "just called".

>>  From my tests back when I was optimising callback invocation, we had
>> around 5 callbacks on average on objects with non-zero number of
>> registered callbacks with a maximum number of around 12 if my memory
>> serves, so this could potentially make callback calls so fast any
>> optimisations won't matter.
>
> This number where from before callbacks array. I am seriously
> interested to know todays number. Also an improved statistic would be
> to know how many callbacks are walked over in the most called case and
> how many of those callbacks are actually in an array already.
>
> <snip>

Callback array or not, you still end up walking all of the callbacks...

>
>>>> We can also store a pointer to the array in a hash table with the key
>>>> being some sort of a hash of the array in order to do some deduplication
>>>> afterwards (point to the same arrays, but obviously different private
>>>> data, so that would still be duplicated) if we feel it's needed. It
>>>> probably won't save as much though and will have some running costs.
>>>
>>> For anything < 16 entries, I bet that a hash table will be slower than
>>> walking an array. Don't forget you need to compute the hash key, jump
>>> in an array, walk down a rbtree and finally iterate over a list. Hash
>>> are good for very large number of object, not for small number.
>>
>> That was an optimisation that I just threw out there to the world, but I
>> believe you misunderstood me. I didn't mean we create a hash table for
>> calling events, it was for saving memory and deduplicating event
>> callbacks (essentially callback arrays automatically). This is only done
>> on callback add/del.
>
> Indeed I missunderstood your intent. Still this will increase the cost
> of insertion for no benefit in my opinion. See below.

Again, this is a side comment, not that important.

>
>>>> The last idea is to keep callback arrays, but kind of limit their scope.
>>>> The problem (or at least one of them) is that callback arrays support
>>>> setting a priority which means calling them needs to be in between the
>>>> calls to normal callbacks. This adds a lot of complexity (this is a very
>>>> hot path, even a simple if is complexity, but this adds more). If we
>>>> define that all callback arrays are always the lowest priority (called
>>>> last), which in practice will have almost zero impact if at all, we can
>>>> just keep them, and just call them after we do the normal callback calls
>>>> (if they exist). We can even optimise further by not making the arrays
>>>> constant, and thus letting us sort them and then run the same algorithm
>>>> mentioned above for searching. This is probably the most acceptable
>>>> compromise, though I'm not sure if it'll block any future optimisation
>>>> attempts that I'm not able to foresee.
>>>
>>> No ! Array are only useful if they are constant ! That is the only way
>>> to share them accross all instance of object. Their size being
>>> ridiculously small, I bet you won't win anything in reordering them.
>>> And if you really want to reorder them, you can do that once at
>>> creation time in the inline function that create them as defined in
>>> Eo.h.
>>
>> That is absolutely untrue. You can reorder them where they are created
>> (like you suggested), or reorder them when they are added and still
>> share them. You'll only need to reorder once, after that, when they are
>> in order, that's it. Const doesn't matter or help at all. Obviously
>> you're expected not to change them.
>
> If the array is not const, then you have to allocate it every time you
> register it. This has a direct cost. Adding the fact you have then to
> sort it out, hash, compare and maybe free it. I seriously doubt the
> wisdom of doing so.
>
> As said above, sort it at creation, add debug code that will warn if
> inserting unsorted array (code that will be disabled in production)
> and just improve walking on those sorted array. I bet that will be
> enough of a speedup for our real use case if there is any (see below).

Either you are missing something or I'm missing something. First of all, 
yes, better to sort on creation, we agree on that.

Const or not, in both cases it's going to be a static array, so 
allocated once. I don't see const would change that. I also don't 
understand what you mean by hash and compare. I think you are confusing 
my previous optimisation suggestion (please strike it out of your 
memory), that has *nothing* to do with hashing callback arrays, at least 
not in this case.

>
>>>> I'm not a huge fan of callback arrays, but if they do save the memory
>>>> they claim to be saving, I see no problem with keeping a more limited
>>>> version of them that let us optimise everything in the manner described
>>>> above.
>>>
>>> I am not a huge fan of optimization without a clear real life case.
>>> Please share number and scenario of when it does matters. I have seen
>>> enough people wasting there time optimizing things that don't matters
>>> that I really take it with a grain of salt if you are not showing real
>>> life scenario. Sharing a callgrind trace or something along that line
>>> would really help make your point here.
>>
>> As I said, it's ~1.5% of the efl cpu usage when scrolling around
>> genlist. It also wastes our memory to have them support priority. And as
>> your changes proved, there is a reason to minimise callback calls, so we
>> already have a case, instead of letting everyone reimplement that
>> counting, it's better to just make callback calls fast. As I said, the
>> price is very small, all I'm asking for is removing priority from
>> callback arrays and always assume they are the lowest priority.
>
> You realize that as an optimization, you are fighting not calling a
> function, not walking an array, doing fetch and compare (even doing a
> dichotomic search). Pretty sure the benefit of not triggering the
> event will remain. Oh and there is plenty of case where, well, you
> will still do the optionnal propagation, like for animator.
>
> As for benchmarking, I did a quick run of 'ELM_TEST_AUTOBOUNCE=300
> valgrind --tool=callgrind elementary_test  -to genlist'. I see a 0.90%
> of the time spend in efl_event_callback_call (~400 000 calls) and
> 0.35% evas_object_event_callback_call (~500 000 calls). It is going to
> be very very hard to win anything on that.
>
> I see also way bigger fish to fish for :
>  - _efl_object_call_resolve 12.53%
>  - efl_data_scope_get 7.75%
>  - efl_isa 3.54%
>  - _efl_object_call_end 2.26%
>
> If you manage to win 10% on any of those, you will have managed more
> than if you reduce the cost of calling efl_event_callback_call to 0. I
> am really not convinced that you are focusing on the right problem at
> all here.
>

As I said, the statement you are making now is not entirely fair. You 
are essentially saying:
I found a very slow function that was showing up in our benchmarks. I 
stopped calling it in a few cases, and now it doesn't show anymore, so 
no need to optimise it.

However, what happens when we use it again? Are we going to have to 
chase it all around and block calls like you did? Isn't it better to 
just make it "good enough for most cases" from the get go so next time 
someone uses it a lot it doesn't show up in our benchmarks?

"Bigger fish to fry" - I fried these fish a lot. Maybe there's still 
room for improvement, but if there is, not much. They are just called a 
shitload of times. If I remember correctly, _efl_object_call_end is one 
line: _eo_unref(obj). And that one is essentially if (--(obj->ref) == 0) 
and then just returns in 99.99% of the cases. Not a lot to optimise 
there. :)

--
Tom.

------------------------------------------------------------------------------
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Re: [E-devel] Callback arrays and callback invocation optimisations

Reply via email to