Re: [E-devel] Shared Strings

Peter Wehrfritz Fri, 09 May 2008 10:32:54 -0700

Carsten Haitzler (The Rasterman) schrieb:
> after. i've run the tests myself now.
>
> i'll add some stats. e17 maintains about 3000-4000 unique strings in
> evas_stringshare. in my tests. i also checked on number of adds, dels and
> lookups in evas_stringshare usage. about 6.48% of calls are for adds of a
> unique string. 5.12% are for deletes of a string. 88.41% are accesses (an add
> or a del of an existing string where the operation does not free or allocate
> any memory but just refcounts up or down).
>
> now based on this, the existing stringshare vs. string_instance means, that if
> you account for the relative usage of add, del and access paths,
> string_instance gets overall faster once you have about 3200 or so strings. so
> e17 is about at the cusp point. it's also one of the heavier users i imagine.
>
> i simply changed hash bucket size to be 1024 items instead of 256 and that
> makes the crossover point at about 7200 strings items - well beyond normal
> usage of e17 anyway. adds and delets are still significantly faster (string
> instance takes 1.8 and 1.4 times the time respectively compared to 
> stringshare)
> even at 10,000 strings. with the 1024 buckets for stringshare of course. but
> string_instance takes 0.8 times the time for lookups.
>
> overall it's a close race. i'll try improve stringshare a little and see what 
> i
> get, but beyond making it have dynamic bucket sizes (like ecore_hash) it isn't
> likely to go far. a dynamic bucket size will mean it will scale very high
> (question: do we need it to go that high?) but the idea of having to re-do the
> bucket array at certain points is a little uncomfortable (so u go from 3000 to
> 3001 and the system has to spend extra cycles re-packing all hash items in a 
> new
> bucket set thats bigger). yes - this is nicer in terms of base mem usage of
> course. so the thing here is to figure out how to get the best thing we can 
> for
> the least cost. i do know stringshare has a 1 alloc per unique string 
> overhead.
> thats about as small as it gets (also either 8 or 12 bytes (32 or 64bit) per
> unique string for refcount and pointer).
>


The (only) one alloc is most probably the reason why has better results 
for adds and removes then the ecore counter-parts. It's probably 
possible somehow to improve the situation for ecore_hash, but it'd make 
the code more of ecore_hash to alloc only one piece of memory per item. 
I think it isn't worth it, because i don't see a general usage for such 
a feature.

> now... more interestingly - i now started looking at the test. it is very
> artificial. 

Yes, i know. I start to read the two implementations and saw the 
difference. And i wondered if how they affect the performance. How much 
is the overhead of the two allocations? Where will evas start to suck, 
because of the static bucket count. So i've written the test case to get 
a feeling for it. I know it is artificial, but I think it is still 
interesting, even if it doesn't represent a normal usage.

> at most 2 copies of a string, so n repeated adds, and all short
> strings. not very representative of common usage. in common usage i have
> seen some strings with refcounts of > 200. in fact the del wont work with
> stringshare. on del u need to supply the actual string pointer - not the
> snprintf'd buffer. so nothing gets found and deleted. ie its meant to work 
> with:
>
> char *s;
>
> s = evas_stringshare_add("string");
> ...
> evas_stringshare_del(s);
>
> ie - the same return from stringshare.
> evas_stringshare_del("string") will not work.
>
> ... so as such you need a test that is more representative of actual usage.
>
> so that's just what i did. i literally logged all stringshare add's, dels etc.
> in such a way it'd produce "correct code" from a session of e17 i fiddled with
> for about 5 minutes doing stuff. you'll forgive me for not including the code
> as the .c file generated is 11m (239,000 lines of c) that i included into the
> compare.c infra to test both ecore and evas code and just time it doing 
> exactly
> what e does. i also included "nops" where functions are called, but do nothing
> so we can remove the simple test harness and function call overhead and 
> compare
> just core. as it was a little too fast i make it run 1000 loops of what e
> actually does one after the other (yes it'll be bad as it doesn't start with a
> clean slate but better than nothing). result for 1000 iterations one after the
> other:
>
> evas: 20.691495
> ecore: 30.510302
> nops: 3.444793
>
> real factor: 1.57
>
> so really 20.69 - 3.44 vs 30.51 - 3.44 - i.e 17.25 vs 27.07 (evas being the
> lower). yes - this means a lot of things will get high refcounts as things get
> re-added a lot and then not removed, so the raw results of only 1 iteration:
>
> evas: 0.031672
> ecore: 0.045482
> nops: 0.004831
>
> real factor: 1.51
>
> not as accurate as the times are so small, but the same order of magnitude as
> above.
>
> so as such... if we are doing benchmarks to know which implementation to use 
> to
> find the one with best results - at least for the case of e17, evas is the
> winner here. of course if you think e17's use case is pretty atypical and you
> need another one, we should continue to check.
>
> comments. ?
>   

Imho, the key point is how many strings are added to the hash. For small 
apps that only share a dozen of strings, the hash is bigger then needed, 
although the extra 4k are probably a overhead most people can live with. 
And i don't know if there are apps where the number of strings ever 
exceed 10,000, assumedly not. I haven't tested yet, how many shared 
strings ewl has. I think as long noone reports that he is using 
evas_stringshare or ecore_string with much more then 10,000 strings, we 
can take the evas code for the new lib.

BTW, we will need evas_stringshare_init()/_shutdown() functions, so 
nobody has to have the 4k occupied memory if he doesn't use stringshare 
and _shutdown() can clean up the memory if there are still unreleased 
strings. For instance ewl loose some references intentionally, because 
they are quit often used. I'll write them, if the evas implementation is 
chosen.

Peter

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Re: [E-devel] Shared Strings

Reply via email to