On Fri, Dec 20, 2024 at 8:29 PM Larry Garfield <la...@garfieldtech.com> wrote:
> Background: PHP has a not-often-considered feature, the stat-cache. That > is, the runtime caches the OS stat() call for files, so that subsequent > reads on the same file can be faster. However, it's even less realized > that it's a single-file cache. It literally only applies when you try to > do two file-infomation operations on the same file in rapid succession, > without any other file reads in between. > > For more info: > https://tideways.com/profiler/blog/the-php-stat-cache-explained > > Because it's so rarely relevant, in the cases it is relevant, it can be > quite a surprise, and a surprise causing weird and hard to explain caching > bugs in applications. > > The cache also dates from 20 years ago, when Rasmus added it (and the > realpath cache) in Yahoo's forked PHP 4, and then it got integrated into > PHP 5. However, hard drives are vastly faster than they were then, and > operating systems are vastly more efficient than they were then. > > There's been some discussion about making the cache disable-able, though > the consensus now seems to be leaning toward getting rid of it outright: > > https://github.com/php/php-src/pull/17178 > > Arnaud ran some quick benchmarks and found that disabling it has a less > than 1% impact on Symfony and WordPress. > > https://github.com/php/php-src/pull/17178#issuecomment-2554323572 > > Before we go any further, is there appetite among the voting population to > remove it? clearstatcache() and similar functions would get stubbed out as > no-ops, but otherwise we'd just hand the responsibility back to the OS > where it belongs, which seems so far like it would be almost an > unmeasurable performance difference but remove some surprise complexity. > > Would you support such a removal? > What additional data would you need to make the case for such removal? > I would prefer to disable it by default but keep some option (INI) to re-enable it. I think that for most users the perf impact will be negligible. However, it is quite likely that there are some user workflows and platforms where benefiting from the stat cache can be still significant in terms of performance. So those users should have the option to re-enable it if they see some significant regression rather then force them to update their code to make it faster or implement their own cache which would just make their migration to the next version much harder / potentially impossible. There is not such a huge maintenance that we would really need to get rid of it completely. I would really prefer having such option and tell to users to re-enable it rather than not be able to deal with potentially reported future perf regressions. I think the main issue with the cache is that is just not convenient for use cases where it doesn't get flushed during some different access methods that don't trigger flush. We could probably improve the stream situation a bit but it still leaves external (e.g. shell) access problem in place which we just cannot fix. On the other hand it is possible to use it in a way that users can profit from it but they really need to know how it works. That's way it should be an optional feature IMO. We should also improve documentation in that regards. In terms of voting, if there was no option to re-enable it, I would probably vote against this proposal as I'm a bit worried about those possible regression reports. Regards Jakub