Moin Dmitry, On Mon, October 6, 2014 09:01, Anatol Belski wrote: > On Sun, October 5, 2014 21:32, Anatol Belski wrote: > >> Hi Dmitry, >> >> >> >> On Wed, October 1, 2014 08:01, Dmitry Stogov wrote: >> >> >>> Hi Anatol, >>> >>> >>> >>> >>> I know, TSRM uses TLS APIs internally. >>> >>> >>> >>> >>> In my opinion, the simplest (and probably efficient) way to get rid >>> of TSRMLS_DC arguments and TSRMLS_FETCH calls, would be introducing a >>> global thread specific variable. >>> >>> __thread void ***tsrm_ls; >>> >>> >>> >>> >>> As I understood it won't work on Windows anyway, because windows >>> linker is not smart enough to use TLS variables across different DLLs. >>> May be >>> it's possible to have a local thread specific copy of tsrm_ls for each >>> DLL, but >>> then we should make them to be consistent... >>> >>> Sorry, I can't give you any advice, and can't spend a lot of time on >>> this topic. >>> >>> May be description of TLS internals on ELF systems would give you >>> some ideas. >>> >>> http://www.akkadia.org/drepper/tls.pdf >>> >>> >>> >>> >>> Thanks. Dmitry. >>> >>> >>> >>> >> I've reworked this patch to take a pointer per one shared unit. Please >> see here >> http://git.php.net/?p=php-src.git;a=commitdiff;h=76081df168829a5cc0409f >> ac 47c217d4927ec6f6 >> (though this was just the first in the series). Afterwards I've adapted >> ext/standard and also converted ext/sockets as an exemplary item because >> it's usually compiled shared. >> >> With this change I experience much better performance - a diff is in >> 100-50ms range compared to the master TS build. Particular positions in >> bench.php show even some better result. >> >> However this is not a global __thread variable, but a local one to >> every shared unit. Say tsrm_ls will have to be declared in every so, dll >> or exe and updated on request. For now I've put the update code in MINIT >> and into the first ctor (zmm is the one in the php7ts.dll) called. The >> ctor seems to be the only reliable place (but maybe I'm wrong), despite >> it'll be called for every request instead of per thread, that won't be >> very bad. >> >> >> I'd suggest to go this way so we have the same flow everywhere. >> >> >> the perf issue is fixed now, still yet core only converted, but here are Zend/bench.php results on 64 bit
master ts linux simple 0.158 simplecall 0.050 simpleucall 0.148 simpleudcall 0.151 mandel 0.310 mandel2 0.337 ackermann(7) 0.088 ary(50000) 0.010 ary2(50000) 0.009 ary3(2000) 0.154 fibo(30) 0.285 hash1(50000) 0.029 hash2(500) 0.023 heapsort(20000) 0.072 matrix(20) 0.082 nestedloop(12) 0.204 sieve(30) 0.062 strcat(200000) 0.014 ------------------------ Total 2.185 native-tls linux simple 0.072 simplecall 0.036 simpleucall 0.163 simpleudcall 0.169 mandel 0.297 mandel2 0.354 ackermann(7) 0.123 ary(50000) 0.010 ary2(50000) 0.009 ary3(2000) 0.158 fibo(30) 0.396 hash1(50000) 0.030 hash2(500) 0.024 heapsort(20000) 0.072 matrix(20) 0.069 nestedloop(12) 0.130 sieve(30) 0.054 strcat(200000) 0.011 ------------------------ Total 2.178 master ts windows simple 0.100 simplecall 0.048 simpleucall 0.146 simpleudcall 0.120 mandel 0.292 mandel2 0.364 ackermann(7) 0.091 ary(50000) 0.009 ary2(50000) 0.008 ary3(2000) 0.133 fibo(30) 0.238 hash1(50000) 0.025 hash2(500) 0.020 heapsort(20000) 0.076 matrix(20) 0.069 nestedloop(12) 0.168 sieve(30) 0.048 strcat(200000) 0.011 ------------------------ Total 1.965 native-tls windows simple 0.100 simplecall 0.050 simpleucall 0.108 simpleudcall 0.110 mandel 0.292 mandel2 0.347 ackermann(7) 0.097 ary(50000) 0.009 ary2(50000) 0.008 ary3(2000) 0.140 fibo(30) 0.280 hash1(50000) 0.025 hash2(500) 0.021 heapsort(20000) 0.075 matrix(20) 0.072 nestedloop(12) 0.176 sieve(30) 0.048 strcat(200000) 0.010 ------------------------ Total 1.969 Still there is some room for improvement (for instance the fibo results), but the overall result shows at least same perf now. What do you think guys? Regards Anatol -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php