Hi Matt,
On Wed, Jul 22, 2015 at 11:16 PM, Matt Wilmas <php_li...@realplain.com> wrote: > Hi again Dmitry, all, > > Hopefully the final update on this, before all is revealed... :-) > > ----- Original Message ----- > From: "Matt Wilmas" > Sent: Tuesday, July 07, 2015 > > Hi again Dmitry, all, >> >> [...] >> >> Just an update... I didn't abandon this; quite the opposite! I thought >> I'd just put the finishing touches on my implementation and have it to you >> almost a week ago. After my rough initial test version, I made some >> obvious, simple changes to reduce instructions/code size (slightly). And >> then analyzing different stuff with GCC and MSVC to see if it could be >> improved more (not really since fairly straightforward), etc... >> >> ~5 days ago when I was done messing and changed the macros to recompile >> the existing FAST_ZPP parts, I didn't know what the size difference would >> be vs no FAST_ZPP (traditional). I had overestimated the savings ("maybe a >> few more bytes" for instructions). It was in the 30-45% range of your >> inlined version. >> >> I made a change to save instructions, but, strangely, it didn't really >> have the effect on size I thought it would. :-/ >> >> BTW, the improvement on Linux with GCC 4.8 was about the same: ~70% of >> inlined. So roughly ~2/3 speed for ~1/3 space. I also finally installed >> Valgrind and used Callgrind for the first time. Simple. :^) About same >> relative reduction in instructions. >> >> I really wanted the code size to be smaller if this could get widespread >> use, and started wondering, "What if...?", "How?", "Why not?", "But..." >> >> Then I had a new idea, but wasn't sure what the compilers would do with >> it. So I spent Sunday prototyping a couple key parts of it outside of PHP. >> GCC can make a HUGE mess of it, but easily worked around. So it looks >> good, even better than the ideal I had imagined. Now I just have to do it >> for PHP... >> >> This way saves the lea instructions for each &dest variable (like the >> inline version), and then some. And just earlier I realized there's a way >> to save the other instructions (while using the same macro syntax), which >> would also apply to the previous implementation. >> >> So ideally, this means at the CALL site, we should be able to have the >> zend_fast_parse_... function call: Just mov+mov+lea+call on 64-bit, and >> that's it. The rest of the stuff (a good amount) can be COMPLETELY >> optimized away! :-O >> >> And in the parse_... function, compared to the *inline* FAST_ZPP, that >> should get it down to about 3 dozen more instructions per parameter: while >> + switch + checks in zend_parse_arg_* that would get optimized away when >> inlined. >> >> Well, I'll send the implementation(s) for you to test as soon as I can! >> > > I tried to rush and finish things up before the weekend *2 weeks ago*, but > it took me too long to get the macros sorted out and working right. :-/ > Sorry for the delay, but more and better goodness should now be included. > The extra time allowed me to "relax and take notes" (Notorious B.I.G.), > however. :-D > > So yeah, that was all working 10 days ago. Then I realized more function > param data could be packed together which saved another mov instruction -- > so at the call site, it's just mov+lea+call on 64-bit (since execute_data > is already in %rdi). There's nothing else (ignoring checking return > value/return on error, etc.), and each &dest variable is filled in even > though their address isn't taken (thanks to compiler magic). The only > exceptions are FUNC (4 instructions I think) and OBJECT_OF_CLASS and > VARIADIC (1 instruction) types. > > Unfortunately (only because I said "same macro syntax," but no big deal), > the syntax had to be changed, from: > > ZEND_PARSE_PARAMETERS_START[_EX](...) > Z_PARAM_*(...) > Z_PARAM_*(...) > ZEND_PARSE_PARAMETERS_END[_EX] > > to > > ZEND_PARSE_PARAMETERS_START[_EX](...)( // Parentheses > Z_PARAM_*(...), // Comma-separated > Z_PARAM_*(...) > ) ZEND_PARSE_PARAMETERS_END[_EX] > Errors in nested macros might be very difficult to understand :( I would prefer not to use nested macros without a significant gain. > > Overall, the *code* size is reduced (vs traditional ZPP), but the file > size isn't (static stuff in rodata or whatever), which was a bit > surprising, although most of these PHP functions don't have many > parameters... > I may just guess, where this static data came from, because I didn't see the code yet :) Thanks. Dmitry. > > The biggest size savings actually came from the simple initial > optimization of zend_parse_params_none(). Down to almost nothing, much > faster, and saved 4KB on my --disable-all builds. > > > NEW GOODNESS -- What would of course be nice to have is a big optimization > of the traditional zend_parse[_method]_parameters[_ex|_throw] to avoid > changing them all. And it seems some people, like Derick, prefer it. > > Of course the obvious way I first had in mind weeks ago was to simply > parse its format string faster (once-ish) at runtime, and then feed it to > this new FAST_parse function. Should give at least 2x speedup I figured. > But with this latest implementation, where the function should probably now > be called parse_parameters_ARRAY instead of fast_parse, it would need a > second pass after parsing the string. Not a huge deal, but... > > What would be *really nice* is to have the compiler parse the format > string, at compile time, and use the new system directly. And... that > should be possible!! 8-) > > Last week I figured GCC's "statement expressions" [1] could be used, which > most compilers seem to support, except MSVC. But just over the weekend I > realized an inline function could be used with a compound literal (for the > varargs), which is also supported in the latest MSVC versions. Awesome! > > And again, fear not, ALL the code can be completely removed by the > compiler, leaving only movb instructions instead of lea+mov/push for the > traditional ZPP function call. So, better than my initial > implementation(s), and nearly the same as my final macro version! I was > just testing prototypes of portions with GCC yesterday, which does fine > after adjusting to not generate *horribly stupid* code. > > Now to implement it into PHP ASAP! Then I'll save a few more > banches/instructions in the parse function (specialized for common cases; > some useless GCC instructions), comment and clean up my experimental mess, > and write up some explanation of the changes before sending patch. Oh, and > I should verify what Clang does with the code as well... > > Stay tuned! > > [1] https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html > > > - Matt >