i've been spending a bit of time profiling eo. SETUP:
here is my test. frankly this is a COMMON CASE test of scrolling genlist around. this is incredibly common, and if it's slow people notice. so here is the case: export ELM_ENGINE=gl export ELM_TEST_AUTOBOUNCE=1 then every time: elementary_test -to genlist use perf. use valgrind/callgrind/cachegrind - whatever. the results are similar. this removes any rendering (sw rendering) from the equation. RESULTS: eo is using about 25-30% of ALL CPU TIME ... just to find objects, resolve functions, go in and out of eo do, call a callback (finding the callback to call then calling). so about 27% is what i get from callgrind. 1% of all cpu time is JUST "eina_main_loop_is()" which is getting the eo call stack. i've tried _thread. it's no better actually. you would think it would be - but no. compiler+ld+glibc hasn't found a more efficient way of having thread local vars than we have. but THIS IS 1% of that 27... so 1/27th of eo overhead is just this eo call stack design. i think it's time we look at eo now not from a "oh but thats not clean" perspective but "this is going to be faster" perspective. at this point eo1 looks better because at least we didnt need an eo call stack and could pass any context on the stack of the thread itself. we need to reconsider this callstack and pass this into functions. now _eo_call_resolve uses about 7.8-8% out of our 27% cpu. this needs some real looking at. i cut it down from about 10% by adding a call cache that stores the last call that was looked up for that klass + op. it's crazy but within this func, 0.45% of our cpu time seems to simply be checking if the eo op id is valid the compare + branch... alone... _eo_do_start uses about 6-6.5% of our cpu time. eo_data_scope_get is 5%. _eo_do_end even is about 2.9%. these all add up and every pass through an eo interface is costing the above. but we need to stand back and look at eo from a performance perspective. this MAY mean making decisions and changes that are not as "elegant" in the name of cutting this overhead down to less than 5%. i would say that should be the goal. but we need to talk here. one thing that is causing a lot of eo chatter is a lot of: blah_xxx_set() and some blah_xxx_get() and in most of these cases the values are the same is same x,y same r, g, b, a etc. from a design perspective it'd have value to "teach" eo about at least some basic property types. eg an int, a pair of ints, a double, a set of 4 ints etc. etc. and eo KNOWS where in memory this property is stored in the object and can avoid resolving anything if the values are already the same. so think of a "pure" property that simply stores the values u give it and IF they are different - possibly triggers an action. these cases mean that it could be optimized outside of the object code. what we would need is a way to map N input values to N pointer offsets and types in the object. eo would just get, compare, and move onto the next one if the same. if all same - return. if any changes, call real call. this would be easier with varargs imho. ie - eo1. we do things like try and resolve calls for null objects where near the start of the resolve after getting stack - we return if its not valid. if (EINA_UNLIKELY(!fptr->o.obj)) return EINA_FALSE; like that. we could check before we resolve.... anyway. i am inviting people to look into the guts of eo and think up ways to speed it up - but design or any other means. i suspect the speedups we can get now that are meaty enough will all be design and abi break changes. so let's get on with this now. -- ------------- Codito, ergo sum - "I code, therefore I am" -------------- The Rasterman (Carsten Haitzler) ras...@rasterman.com ------------------------------------------------------------------------------ _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel