On Tue, Mar 14, 2017 at 11:29 AM, David Zuelke <d...@heroku.com> wrote:

> Hi all,
>
> There appears to be a performance regression in the CFG and DFA based
> optimization passes of OPcache in PHP 5.6+ when loading huge classes (such
> as those generated by Symfony's routing component) for the first time.
>
> The issue does not occur on PHP 5.5. Tests below are with 5.6.30, 7.0.16
> and 7.1.2 and default INI settings; I replicated the issue on both macOS
> and Linux.
>
> Test file here (it's from an actual application, slightly anonymized, not
> a synthetic example): https://gist.github.com/dzuelk
> e/fe867f55f09e0bf79ecefcc815b7fe92
>
> Without OPcache, everything is fine in all versions:
>
> $ time -p php -dopcache.enable_cli=0 hugeclass.php
> real 0.10
> user 0.09
> sys 0.00
>
> With OPcache on, things are suddenly much, much slower:
>
> 5.6:
>
> $ time -p php -dopcache.enable_cli=1 hugeclass.php
> real 3.23
> user 3.21
> sys 0.02
>
> 7.0:
>
> $ time -p php -dopcache.enable_cli=1 hugeclass.php
> real 1.76
> user 1.73
> sys 0.02
>
> 7.1:
>
> $ time -p php -dopcache.enable_cli=1 hugeclass.php
> real 4.01
> user 3.98
> sys 0.02
>
> For comparison, 5.5 is as speedy as you'd expect it to be:
>
> $ time -p php -dopcache.enable_cli=1 hugeclass.php
> real 0.14
> user 0.11
> sys 0.02
>
> If we switch off optimization passes 5 (CFG based) and 6 (DFA based, only
> in 7.1), everything is great again in all versions:
>
> $ time -p php -dopcache.enable_cli=1 -dopcache.optimization_level=0x7FFFFFCF
> hugeclass.php
> real 0.13
> user 0.10
> sys 0.02
>
> For 5.6 and 7.0, pass 6 is not a thing, but in 7.1, we can inspect passes
> 5 and 6 separately.
>
> Pass 5 (CFG based) already makes for a drastic performance hit in 7.1:
>
> $ time -p php -dopcache.enable_cli=1 -dopcache.optimization_level=0x7FFFFFDF
> hugeclass.php
> real 0.88
> user 0.86
> sys 0.01
>
> But pass 6 (DFA based) is the one that causes the biggest slowdown in 7.1:
>
> $ time -p php -dopcache.enable_cli=1 -dopcache.optimization_level=0x7FFFFFEF
> hugeclass.php
> real 3.29
> user 3.24
> sys 0.04
>
> In all versions, subsequent loads from the cache (such as when running FPM
> or the built-in web server) are fast.
>
> Is this slowness with a cold cache expected/accepted, or does that qualify
> as a bug?
>

Yes, this is a bug. Optimization should never be this slow.

>From a quick perf run, the problem in the DFA pass is that TI is quadratic
in the number of calls (as call lookup is implemented as a linear scan).
This can be fixed with a more efficient map lookup.
The problem in the CFG pass is that a large buffer is zeroed repeatedly.

Nikita

Reply via email to