Hi,
I was skeptical about this idea, but the PoC looks interesting and quite simple. This might be too big change for 7.*, if we won't completely hide implementation details for extensions using existing macros... I'm not sure if this will lead to performance improvement. On one hand, we won't need to read elements form referenced zend_string structure (should improve data locality and cache usage), on the other, we'll have to perform additional checks for ZSTR_IS_PACKED() (may increase branch miss-prediction and iCache misses). Andrea, did you try to run some benchmarks? real-life apps? Thanks. Dmitry. ________________________________ From: Andrea Faulds <a...@ajf.me> Sent: Tuesday, September 13, 2016 9:26:03 PM To: internals@lists.php.net Subject: Re: [PHP-DEV] Directly embed small strings in zvals Hi Ben! Benjamin Coutu wrote: > I was wondering if it would make sense to store small strings (length <= 7) > directly inside the zval struct, thereby avoiding the need to extra allocate > a zend_string, which would also not entail any costly indirection and > refcounting for such strings. > > The idea would be to add a new sruct ``struct { uint8_t len; char val[7]; } > sval`` to the _zend_value union type in order to embed it directly into the > zval struct and use a type flag (zval.u1.v.type_flags) such as > IS_SMALL_STRING to destinguish between a regular heap allocated zend_string > and the directly embedded compact representation. > > Small strings are quite common IMHO. In fact quickly sampling my company's > PHP code base I found well over 50% of the strings to be of length <= 7. It > would save a lot of memory allocations as well as pointer indirection, and > could also bypass refcounting logic. Also, comparing small strings for > equality would become a trivial operation (just comparing two pre-aligned > 64bit integers) - no more need to keep small strings interned. > > Of course it wouldn't longer be possible to also persistently store the hash > value of a small string, though calculating the hash value for small strings > is less costly anyways because less characters equals less iterations, so > that might not be an issue in practice. > > I don't see such an idea in https://wiki.php.net/php-7.1-ideas and I was > wondering: Has anybody experimented with that approach yet? Is it worth > discussing? Funnily enough, I was thinking of trying to implement this recently. It's an interesting idea. I've previously tried implementing this with a slightly different approach, whereby the string is embedded within a tagged zend_string pointer, rather than as a separate zval type. This would mean you only need to change the zend_string functions and the string macros, and you can use such strings anywhere a zend_string pointer is used, not just in zvals. My quick-and-dirty implementation can be found here: https://github.com/php/php-src/compare/master...TazeTSchnitzel:packedStrings However, I don't think it benchmarked that well, and it broke some things (when you contain the string data inside the pointer itself, it breaks certain assumptions existing code makes). I don't know how performant such an approach could be, considering it adds an extra branch at the site of every index into a string (though some sort of caching, or compiler optimisations, might improve this). And I don't know quite how much stuff it breaks. My inspiration was a blog post about how Apple's Objective-C packs strings inside pointers, where possible. I think it was this one: https://www.mikeash.com/pyblog/friday-qa-2015-07-31-tagged-pointer-strings.html My approach was less sophisticated than Apple's. Whereas Apple can store strings as 8-bit, 6-bit or 5-bit, I took the easier approach of just having 8-bit strings, for direct indexing. For a 64-bit pointer, my implemention let you have up to 6 bytes for character data, 1 byte for a null terminator, and 1 byte for containing the pointer tag bit. Your suggested approach (packing the string inside the zval as another kind of zval) is also possible, but I haven't tried it. It would have the advantage of letting you use strings that are a byte longer, but you could only use it for strings in zvals, and it would possibly require more code changes overall (adding support for a new zval type everywhere, vs. only updating code making the wrong assumptions about zend_string pointers). Also, a potential problem is that possibly a lot of PHP functions deal with strings only as zend_strings, so PHP might end up converting back and forth between packed/small string zvals and zend_strings just to satisfy such functions. In particular, zend_parse_parameters might have to do that. Your approach is less hacky and more portable, though. Notably, it works on 32-bit systems (whose pointers are too small for useful string packing, but which still give you 8 bytes to play with in the zval). Regarding hash values, there's actually a simple solution to that, at least on 64-bit platforms. I changed the hash function to check the string length, and if it's small enough to be a packed string, use the packed string as the hash itself. (After all, they're both 64-bit numbers!) This costs nothing if the string is already a packed string, and if it isn't, the conversion is extremely simple. Of course, this comes at the cost of adding an extra branch to the hash function for non-packed strings. Anyway, generally, I have no idea if either approach is all that good an idea or not. More research might be needed. It's certainly fun to discuss! :) -- Andrea Faulds https://ajf.me/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php