Hi Dmitry, all,
Things are looking really good now, f-i-n-a-l-l-y! Last weekend, after
thinking "no more changes" again, I thought of a couple more improvements,
hah, and now I think there isn't much that could be improved.
So it should be fully alive this weekend, flying, with the fastest/smallest
parameter parsing we could imagine, across all of PHP! I guess that means
start looking for it next week...? :-) I may just send a patch sooner
without even writing up an explanation about parts first like I planned.
More below...
----- Original Message -----
From: "Matt Wilmas"
Sent: Wednesday, August 05, 2015
Hi Dmitry,
[...]
Unfortunately (only because I said "same macro syntax," but no big
deal),
the syntax had to be changed, from:
ZEND_PARSE_PARAMETERS_START[_EX](...)
Z_PARAM_*(...)
Z_PARAM_*(...)
ZEND_PARSE_PARAMETERS_END[_EX]
to
ZEND_PARSE_PARAMETERS_START[_EX](...)( // Parentheses
Z_PARAM_*(...), // Comma-separated
Z_PARAM_*(...)
) ZEND_PARSE_PARAMETERS_END[_EX]
Errors in nested macros might be very difficult to understand :(
I would prefer not to use nested macros without a significant gain.
[...]
Anyway though, it doesn't matter much; not sure what you'll want to do
with all the possibilities I have! And a simple script converts
occurrences to the new syntax for testing (instead of bigger patch).
Significant gain? Nope. :-) I only did that in order to use the "static"
storage specifier in one place, for a pointer to the packed rodata,
instead of filling it at runtime. But I think the file size was the same
with or without static, even though it saved instructions. So not a
requirement, just part of my experiments
I was wrong about this; there is "significant gain." Few days after last
message, I couldn't even figure out my own code, haha, trying to remember
what I tried, when. Anyway, the macro change was NOT to try the "static"
specifier (came later), but is the basis for the compiler "magic" that
allows the &dest vars to never be referenced -- e.g. no lea, etc. for
function. So the minor macro change is important.
Like I said, the BIG neat thing is getting the same optimization (all
except the "static" part) for the *traditional* ZPP. I hadn't touched it
since last message until this week (doing other stuff and too sick ~4 days
to do anything :-/) and wanted to check closer to final code before
replying -- but still looks good with GCC so far!
Clang 3.4 is also generating perfect code for compile-time parsing of
traditional ZPP's format string. I'll monitor it and GCC closely as final
changes are made. I haven't tried older versions yet to see if there's a
minimum version to get compile-time transformation.
BTW, I wondered about zpp's "num_args" param -- assumed it was *always*
equal to ZEND_NUM_ARGS(), until I saw some instances with a fixed literal
number. Oops! Luckily, the "check" can be optimized out in all but one
case AFAICT with GCC. And all but a handful of other cases (in pgsql.c)
with Clang.
So depending, there's maybe less interest in my smaller FAST_ZPP
implementation... *shrug*
Nevermind that comment...
Weeks ago, I thought it might be desired to not give up inlining in all
cases to get small code. So I thought about a "hybrid" system where the
*smallest* code could be inlined for the *simplest* cases (when function
call would have highest % overhead), otherwise call the function. That's in
the process of being finished now, with some settings (#define's) to control
amount of inlining (or none). I'm hoping the compilers will again remove
everything but the few necessary instructions without me having to make
explicit checks...
[...]
sub $0x20,%rsp # 16 bytes more; each parameter needs 16 bytes stack
I realized that the stack space could easily be reduced to 8 bytes per &dest
var, instead of 16/param... Doesn't really matter I guess, but now on
64-bit, stack space is same as normal &dest vars, except zend_bools.
(Compiler effectively *removes* all dest vars. :^))
[...]
That (optimizing traditional string ZPP) will be the *equivalent* of 64KB+
of C code (repetition), all reduced to nothing. :-) And more of that
should (will) be packed together. Hopefully this continues, and with
other compilers, on non-Windows anyway.
Don't know about Windows now... Visual Studio 2008 and 2012 (not much
difference) are NOT optimizing away the code (other times it was GCC with
issues). :-/ Not sure why. Of course they don't support the necessary
compound literals anyway, but I was just testing a manual case... I'll
have to try and check 2015 version soon.
Nope, VS 2015 still won't optimize away any of it, it seems. So looks like
no compile-time transformation of traditional zpp on Windows...
Regardless, there will be a fallback function to be called with optimized
runtime string parsing, to be used if compilers don't create optimized
code. I'll be checking more compilers, of course...
For the sake of Windows (or any other fallbacks), I really wanted to
optimize zpp well for runtime string parsing. After overthinking it, it's
fairly simple, and what could've been there all along. What we should wind
up with is traditional zpp that's as good as the "fast parsing" function,
except: 1) lea, etc. for vars at call site, and 2) the string has to be
looked at ONCE, instead of 6-7 times now. :-O
- Matt
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php