Hi Dmitry,

Dmitry Stogov wrote:
I was skeptical about this idea, but the PoC looks interesting and quite simple.

This might be too big change for 7.*,  if we won't completely hide 
implementation details for extensions using existing macros...

I've wondered about 7.x. You can't *completely* hide the implementation details, because some existing assumptions about pointers to zend_strings can't hold if the value is held within the pointer itself, rather than in the place it points to. A simple example:

    zend_string *a = zend_string_init("foo", 3, 0);
    zend_string *b = a;
    ZSTR_VAL(a)[0] = 'b';
    ZSTR_VAL(a)[1] = 'a';
    ZSTR_VAL(a)[2] = 'r';
    return b;

Without packed strings, if you return b, you get a zend_string containing "foo". With packed strings, you get one containing "bar".

Packed strings don't create any problems if you're just passing on strings or creating new ones from existing data. But they can create problems if you mutate an existing string (or create an empty string and then fill it), and hold more than one pointer to it.

The difficult part is that I don't (currently) know how much code makes these assumptions or is broken by it. I do know that there is code that is broken, because PHP produced some unusual errors when I tried that branch, though it did at least run.

One nice thing, though, is that the API itself doesn't really change, you just have to be more careful about how you use it. So, optimistically, breakage might be quite small. I don't know how much breakage there is currently.

I'm not sure if this will lead to performance improvement.

On one hand, we won't need to read elements form referenced zend_string 
structure (should improve data locality and cache usage),

You also don't have to compute hashes (the pointer is the hash), which is nice. Though that comes at the cost of an extra branch in the hash function.

on the other, we'll have to perform additional checks for ZSTR_IS_PACKED() (may 
increase branch miss-prediction and iCache misses).


Andrea, did you try to run some benchmarks? real-life apps?

I think I tried to benchmark it when I first implemented it, but it's been quite a long time since I last touched it, and I don't have any of the results on hand. I do seem to remember it being slower, probably because of the extra branches. In the worst-case scenario, every single character access from a string gets an extra branch. I don't know how often that happens; possibly compiler optimisations, or manual code changes (caching the value of `ZSTR_VAL()`, for example) could deal with that where it happens. Also, perhaps there's a more efficient way to implement these macros.

I don't think I tried any real-life apps, at least not successfully. In order to try them, first you would need to fix anything packed strings breaks which they rely on.


In summary, unfortunately the answer to most questions here is “I don't know”, because it's been more than a year since I last touched that branch. I might try updating it for current PHP master and test it out again.

Thanks for your interest!

--
Andrea Faulds
https://ajf.me/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to