Hi,

I spent some time, reviewing an old Andrea's idea about packed strings.

https://github.com/hikari-no-yume/php-src/tree/packedStrings

The idea is simple. In every place were we use zend_string*, we may store 
characters directly.

We use low byte to encode packed string marker and string length, we also need 
one byte for trailing zero, so we can keep up to 2-characters on 32-bit system 
and up to 6 characters on 64-bit without allocation of additional memory.


The refreshed dirty PoC implementation 
https://github.com/php/php-src/compare/master...dstogov:packedStrings2?expand=1

You may take a quick look only into zend_string.h changes (the rest is almost a 
monkey work).


I was able to run bench.php, and probably won't go forward.

Unfortunately, I got into two serious problems:


1) The original implementation used packed strings their selves as their hash 
value. This leaded to huge slowdown, because of hash collisions. (e.g. on 
bench.php hash1()). I switched to hash recalculation on each usage, but this 
negates the benefit of allocation elimination. Probably, we may use a cheaper 
hash function for packed strings...


2) PHP still uses char* in many places. When we take ZSTR_VAL() from a packed 
string stored in local variable (or function argument), we may very easy get a 
dangling pointer. (e.g. INI directives processed by OnUpdateString, internal 
functions parameters received as char*, ...). Changing all this char* into 
zend_string* would help, but looks unrealistic for PHP-7.3.


So, I gave up for now.

I decided, to share these results. May be someone would get related ideas.


Thanks. Dmitry.

Reply via email to