On Fri, 30 Oct 2015 18:13:49 +0000 Tom Hacohen <[email protected]> said:

> On 25/10/15 03:05, Carsten Haitzler wrote:
> > i've been spending a bit of time profiling eo.
> >
> > SETUP:
> >
> > here is my test. frankly this is a COMMON CASE test of scrolling genlist
> > around. this is incredibly common, and if it's slow people notice. so here
> > is the case:
> >
> >    export ELM_ENGINE=gl
> >    export ELM_TEST_AUTOBOUNCE=1
> >
> > then every time:
> >
> >    elementary_test -to genlist
> >
> > use perf. use valgrind/callgrind/cachegrind - whatever. the results are
> > similar. this removes any rendering (sw rendering) from the equation.
> >
> > RESULTS:
> >
> > eo is using about 25-30% of ALL CPU TIME ... just to find objects, resolve
> > functions, go in and out of eo do, call a callback (finding the callback to
> > call then calling). so about 27% is what i get from callgrind.
> >
> > 1% of all cpu time is JUST "eina_main_loop_is()" which is getting the eo
> > call stack. i've tried _thread. it's no better actually. you would think it
> > would be
> > - but no. compiler+ld+glibc hasn't found a more efficient way of having
> > thread local vars than we have.
> >
> > but THIS IS 1% of that 27... so 1/27th of eo overhead is just this eo call
> > stack design. i think it's time we look at eo now not from a "oh but thats
> > not clean" perspective but "this is going to be faster" perspective. at
> > this point eo1 looks better because at least we didnt need an eo call stack
> > and could pass any context on the stack of the thread itself. we need to
> > reconsider this callstack and pass this into functions.
> >
> > now _eo_call_resolve uses about 7.8-8% out of our 27% cpu. this needs some
> > real looking at. i cut it down from about 10% by adding a call cache that
> > stores the last call that was looked up for that klass + op.
> >
> > it's crazy but within this func, 0.45% of our cpu time seems to simply be
> > checking if the eo op id is valid the compare + branch... alone...
> >
> > _eo_do_start uses about 6-6.5% of our cpu time. eo_data_scope_get is 5%.
> > _eo_do_end even is about 2.9%.
> >
> > these all add up and every pass through an eo interface is costing the
> > above. but we need to stand back and look at eo from a performance
> > perspective. this MAY mean making decisions and changes that are not as
> > "elegant" in the name of cutting this overhead down to less than 5%. i
> > would say that should be the goal.
> >
> > but we need to talk here.
> >
> > one thing that is causing a lot of eo chatter is a lot of:
> >
> >     blah_xxx_set()
> >
> > and some
> >
> >     blah_xxx_get()
> >
> > and in most of these cases the values are the same is same x,y same r, g,
> > b, a etc. from a design perspective it'd have value to "teach" eo about at
> > least some basic property types. eg an int, a pair of ints, a double, a set
> > of 4 ints etc. etc. and eo KNOWS where in memory this property is stored in
> > the object and can avoid resolving anything if the values are already the
> > same. so think of a "pure" property that simply stores the values u give it
> > and IF they are different - possibly triggers an action. these cases mean
> > that it could be optimized outside of the object code. what we would need
> > is a way to map N input values to N pointer offsets and types in the
> > object. eo would just get, compare, and move onto the next one if the same.
> > if all same - return. if any changes, call real call.
> >
> > this would be easier with varargs imho. ie - eo1.
> >
> > we do things like try and resolve calls for null objects where near the
> > start of the resolve after getting stack - we return if its not valid.
> >
> >    if (EINA_UNLIKELY(!fptr->o.obj))
> >      return EINA_FALSE;
> >
> > like that. we could check before we resolve....
> >
> > anyway.
> >
> > i am inviting people to look into the guts of eo and think up ways to speed
> > it up - but design or any other means. i suspect the speedups we can get
> > now that are meaty enough will all be design and abi break changes. so
> > let's get on with this now.
> >
> 
> Hey,
> 
> A lot of it is already optimised in my devs/tasn/eo_optimisations 
> branch. I think it's already down to 20% (or less? not sure) if I 
> remember correctly. Hopefully, if your modifications help on top of 
> mine, we'll get to 18%. I have an idea (which I've already shared with 
> you on IRC) that could reduce it drastically more, and I have other 
> ideas that may help in that regard too.
> 
> The main idea, which may prove a bit controversial, is to increase our 
> dependency on Eolian. That is, add more boiler-plate, but that 
> boiler-plate will in increase speed. The plan is for every function, 
> e.g. efl_text_set, create these definitions:
> 
> EOAPI void _EO_efl_text_set(Eo_Context *ctx, const char *part, const 
> char *text);
> #define efl_text_set(part, text) _EO_efl_text_set(__eo_ctx, part, text)
> 
> this way, I could have a local variable in eo_do that is the context. 
> I'm not sure if that could work in eo_do_ret (it might, I have an idea 
> how, but it would be a bit slower than normal eo_do). Kolesa has already 
> put in the Eolian support I needed, and I'll get into implementing it 
> early next week.
> 
> My plan is to come back with some stats for all the proposed changes. 
> None of those are API changes, but they are ABI changes.

please do. i think that this abi change and relying on eolian to fix it up with
boilerplate is worth it if the performance gains are. reality is that a SMARt
ide could figure out the typing of the  macros anyway ... we just raise the bar
for the ide...

> The biggest concern about this change is that we'll lose type annotation 
> for autocompletion in IDEs, or in more simple terms, when you 
> autocomplete in an IDE you'll now see: "efl_text_set(part, text)" 
> instead of "efl_text_set(const char *part, const char *text)". 
> Compilation errors will still work as expected.

in fact we could generate "fake" header files that provide the types in a
function with the same name as the macro, thus it'd have typing, but this
header is for ide usage only, not actual compilation. something to consider for
eolian to do... ?

> One more thing to keep in mind, is that a lot of this code is 
> SIGNIFICANTLY faster with -O2 than -O0. That is because a lot of the 
> code I write, I split to inlined functions, or similar things for 
> clarity, which any decent compiler will optimise, but without 
> optimisations is just damn slow.

yeah. all my testing is with -O2.

-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    [email protected]


------------------------------------------------------------------------------
_______________________________________________
enlightenment-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to