Hi Dmitry,
Dmitry Stogov wrote:
I was skeptical about this idea, but the PoC looks interesting and quite simple.
This might be too big change for 7.*, if we won't completely hide
implementation details for extensions using existing macros...
I've wondered about 7.x. You can't *completely* hide the implementation
details, because some existing assumptions about pointers to
zend_strings can't hold if the value is held within the pointer itself,
rather than in the place it points to. A simple example:
zend_string *a = zend_string_init("foo", 3, 0);
zend_string *b = a;
ZSTR_VAL(a)[0] = 'b';
ZSTR_VAL(a)[1] = 'a';
ZSTR_VAL(a)[2] = 'r';
return b;
Without packed strings, if you return b, you get a zend_string
containing "foo". With packed strings, you get one containing "bar".
Packed strings don't create any problems if you're just passing on
strings or creating new ones from existing data. But they can create
problems if you mutate an existing string (or create an empty string and
then fill it), and hold more than one pointer to it.
The difficult part is that I don't (currently) know how much code makes
these assumptions or is broken by it. I do know that there is code that
is broken, because PHP produced some unusual errors when I tried that
branch, though it did at least run.
One nice thing, though, is that the API itself doesn't really change,
you just have to be more careful about how you use it. So,
optimistically, breakage might be quite small. I don't know how much
breakage there is currently.
I'm not sure if this will lead to performance improvement.
On one hand, we won't need to read elements form referenced zend_string
structure (should improve data locality and cache usage),
You also don't have to compute hashes (the pointer is the hash), which
is nice. Though that comes at the cost of an extra branch in the hash
function.
on the other, we'll have to perform additional checks for ZSTR_IS_PACKED() (may
increase branch miss-prediction and iCache misses).
Andrea, did you try to run some benchmarks? real-life apps?
I think I tried to benchmark it when I first implemented it, but it's
been quite a long time since I last touched it, and I don't have any of
the results on hand. I do seem to remember it being slower, probably
because of the extra branches. In the worst-case scenario, every single
character access from a string gets an extra branch. I don't know how
often that happens; possibly compiler optimisations, or manual code
changes (caching the value of `ZSTR_VAL()`, for example) could deal with
that where it happens. Also, perhaps there's a more efficient way to
implement these macros.
I don't think I tried any real-life apps, at least not successfully. In
order to try them, first you would need to fix anything packed strings
breaks which they rely on.
In summary, unfortunately the answer to most questions here is I don't
know, because it's been more than a year since I last touched that
branch. I might try updating it for current PHP master and test it out
again.
Thanks for your interest!
--
Andrea Faulds
https://ajf.me/
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php