On Fri, 30 Oct 2015 18:13:49 +0000 Tom Hacohen <[email protected]> said:
> On 25/10/15 03:05, Carsten Haitzler wrote: > > i've been spending a bit of time profiling eo. > > > > SETUP: > > > > here is my test. frankly this is a COMMON CASE test of scrolling genlist > > around. this is incredibly common, and if it's slow people notice. so here > > is the case: > > > > export ELM_ENGINE=gl > > export ELM_TEST_AUTOBOUNCE=1 > > > > then every time: > > > > elementary_test -to genlist > > > > use perf. use valgrind/callgrind/cachegrind - whatever. the results are > > similar. this removes any rendering (sw rendering) from the equation. > > > > RESULTS: > > > > eo is using about 25-30% of ALL CPU TIME ... just to find objects, resolve > > functions, go in and out of eo do, call a callback (finding the callback to > > call then calling). so about 27% is what i get from callgrind. > > > > 1% of all cpu time is JUST "eina_main_loop_is()" which is getting the eo > > call stack. i've tried _thread. it's no better actually. you would think it > > would be > > - but no. compiler+ld+glibc hasn't found a more efficient way of having > > thread local vars than we have. > > > > but THIS IS 1% of that 27... so 1/27th of eo overhead is just this eo call > > stack design. i think it's time we look at eo now not from a "oh but thats > > not clean" perspective but "this is going to be faster" perspective. at > > this point eo1 looks better because at least we didnt need an eo call stack > > and could pass any context on the stack of the thread itself. we need to > > reconsider this callstack and pass this into functions. > > > > now _eo_call_resolve uses about 7.8-8% out of our 27% cpu. this needs some > > real looking at. i cut it down from about 10% by adding a call cache that > > stores the last call that was looked up for that klass + op. > > > > it's crazy but within this func, 0.45% of our cpu time seems to simply be > > checking if the eo op id is valid the compare + branch... alone... > > > > _eo_do_start uses about 6-6.5% of our cpu time. eo_data_scope_get is 5%. > > _eo_do_end even is about 2.9%. > > > > these all add up and every pass through an eo interface is costing the > > above. but we need to stand back and look at eo from a performance > > perspective. this MAY mean making decisions and changes that are not as > > "elegant" in the name of cutting this overhead down to less than 5%. i > > would say that should be the goal. > > > > but we need to talk here. > > > > one thing that is causing a lot of eo chatter is a lot of: > > > > blah_xxx_set() > > > > and some > > > > blah_xxx_get() > > > > and in most of these cases the values are the same is same x,y same r, g, > > b, a etc. from a design perspective it'd have value to "teach" eo about at > > least some basic property types. eg an int, a pair of ints, a double, a set > > of 4 ints etc. etc. and eo KNOWS where in memory this property is stored in > > the object and can avoid resolving anything if the values are already the > > same. so think of a "pure" property that simply stores the values u give it > > and IF they are different - possibly triggers an action. these cases mean > > that it could be optimized outside of the object code. what we would need > > is a way to map N input values to N pointer offsets and types in the > > object. eo would just get, compare, and move onto the next one if the same. > > if all same - return. if any changes, call real call. > > > > this would be easier with varargs imho. ie - eo1. > > > > we do things like try and resolve calls for null objects where near the > > start of the resolve after getting stack - we return if its not valid. > > > > if (EINA_UNLIKELY(!fptr->o.obj)) > > return EINA_FALSE; > > > > like that. we could check before we resolve.... > > > > anyway. > > > > i am inviting people to look into the guts of eo and think up ways to speed > > it up - but design or any other means. i suspect the speedups we can get > > now that are meaty enough will all be design and abi break changes. so > > let's get on with this now. > > > > Hey, > > A lot of it is already optimised in my devs/tasn/eo_optimisations > branch. I think it's already down to 20% (or less? not sure) if I > remember correctly. Hopefully, if your modifications help on top of > mine, we'll get to 18%. I have an idea (which I've already shared with > you on IRC) that could reduce it drastically more, and I have other > ideas that may help in that regard too. > > The main idea, which may prove a bit controversial, is to increase our > dependency on Eolian. That is, add more boiler-plate, but that > boiler-plate will in increase speed. The plan is for every function, > e.g. efl_text_set, create these definitions: > > EOAPI void _EO_efl_text_set(Eo_Context *ctx, const char *part, const > char *text); > #define efl_text_set(part, text) _EO_efl_text_set(__eo_ctx, part, text) > > this way, I could have a local variable in eo_do that is the context. > I'm not sure if that could work in eo_do_ret (it might, I have an idea > how, but it would be a bit slower than normal eo_do). Kolesa has already > put in the Eolian support I needed, and I'll get into implementing it > early next week. > > My plan is to come back with some stats for all the proposed changes. > None of those are API changes, but they are ABI changes. please do. i think that this abi change and relying on eolian to fix it up with boilerplate is worth it if the performance gains are. reality is that a SMARt ide could figure out the typing of the macros anyway ... we just raise the bar for the ide... > The biggest concern about this change is that we'll lose type annotation > for autocompletion in IDEs, or in more simple terms, when you > autocomplete in an IDE you'll now see: "efl_text_set(part, text)" > instead of "efl_text_set(const char *part, const char *text)". > Compilation errors will still work as expected. in fact we could generate "fake" header files that provide the types in a function with the same name as the macro, thus it'd have typing, but this header is for ide usage only, not actual compilation. something to consider for eolian to do... ? > One more thing to keep in mind, is that a lot of this code is > SIGNIFICANTLY faster with -O2 than -O0. That is because a lot of the > code I write, I split to inlined functions, or similar things for > clarity, which any decent compiler will optimise, but without > optimisations is just damn slow. yeah. all my testing is with -O2. -- ------------- Codito, ergo sum - "I code, therefore I am" -------------- The Rasterman (Carsten Haitzler) [email protected] ------------------------------------------------------------------------------ _______________________________________________ enlightenment-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
