On Monday, 8 July 2024 at 04:04, Juliette Reinders Folmer <php-internals_nos...@adviesenzo.nl> wrote: > On 2-7-2024 20:05, Gina P. Banyard wrote: >> On Tuesday, 2 July 2024 at 10:52, Juliette Reinders Folmer >> <php-internals_nos...@adviesenzo.nl> wrote: >> >>> * While a number of proposals include an impact analysis (thank you!), a >>> significant number of the proposals don't. >>> It would be appreciated if for those proposals which aren't removing >>> unused/unusable functionality, some sort of impact analysis was added. >> >> You will need to clarify which ones you are talking about. >> These "bulk removal" RFCs are written by various authors over the course of >> a year, and might not have been looked at for 9+ months. > > I'd suggest for an impact analysis/expected impact statement to be added to > the following deprecation proposals:
I am going to start this reply with the following: An impact analysis showing a large impact to userland, is not in itself, an argument against a deprecation. What an impact analysis helps to determine is the length of the deprecation and the timeline for removal. It is getting exhausting to need to provide this, when what it is, is me asking Damien to check usage on the corpus of over 3100 projects, some open source (such as Wordpress, Drupal, OSS accounting software, etc.), top 1000 composer packages, and the private codebases he has access via his company, using his Exakat static analysis tool. [1] The corpus is 160 MLOC (1.2 Billion tokens), 1.4 M files and as already mentioned over 3100 distinct projets. But his tool will sometimes report duplicates, and has outdated versions which might not be affected by the issues anymore. One reason is that some projects inline composer dependencies, and unless I do a painstaking manual review I cannot narrow this down. Especially as it takes time to run the analysis on the corpus, and if I don't ask the precise question I don't get all the relevant stats. So every stat is a conservative approximation. We don't decide to deprecate and remove things for the fun of it. But if something is misleading, badly designed, dangerous, has a security risk, or causes issues it should be deprecated. It is my belief that it does not matter if this affects 10, 10 000, or 10 000 000 codebases. However, how and when we remove this, yes this is affected by the usage. > * session.sid_length and session.sid_bits_per_character Auditing INI setting usages is effectively impossible with Exakat. Misusing these settings can lead to security issues, and the new values will match the existing defaults. I would guess that the majority of users don't even know about this setting and thus are not affected. Similarly, it seems likely that application developers are also not aware of it, causing applications to break if a hosting provider would adjust these settings. For example: if the application expects it to be a specific format, which is defined in the database schema. Considering the above, these INI settings should be removed and deprecated, regardless of impact. > * xml_set_object() and xml_set_*_handler() with string method names This behaviour is unintuitive and breaks all usual language semantics. This should be deprecated and removed regardless of impact. But when I was working on this I had asked Damien to run some analysis with Exakat and found 66 projects. To which I have sent PRs to some to remove the usage, which is extemely simple to do. To clarify the rationale, the following code is ambiguous: `xml_set_element_handler($p, 'strrev')` It either calls the \strrev() string function, or a method called `strrev` on the object provided by a call to `xml_set_object()`. This is going to be the logic as of PHP 8.4 after some refactoring I did last October. In the current released versions it is even more ambiguous, as the object provided by xml_set_object() could be passed *after* setting the string callable. This behaviour is totally unintuitive, so regardless of the impact it should be removed. > * Deprecate proprietary CSV escaping mechanism This is a follow-up on an RFC whose first step was implemented in PHP 7.4. (https://wiki.php.net/rfc/kill-csv-escaping) The first step was implemented (https://github.com/php/php-src/pull/3515) without a vote being held following the discusion on internals: https://externals.io/message/103268 This routinely bites people, and we still get issues about people being confused about this parameter. We really should address this, and not wait yet another 5 years for complaints to once again be raised before we take any action. > * Deprecate strtok() function Symfony agrees with the rationale provided by the RFC and has banned the function from their project: https://github.com/symfony/symfony/issues/57542 This seems to indicate that the rationale around it is sound. But just for the sake of it, I asked Damien, and I don't have the total number of usages, just that at least one call to strtok is made in 274 projects. I have no idea which projects, and whether the majority of them are in a bunch of libraries or not. > * Deprecate returning non-strings values from a user output handler > * Deprecate producing output in a user output handler This is hard to analyse as it depends on runtime execution. However, the current behaviour when doing one of these things is questionable and/or broken. And I firmly believe this should be deprecated/changed regardless of impact. > * Deprecate mysqli_refresh() > * Deprecate mysqli_kill() These are following upstream deprecations from MySQL. > * Deprecate lcg_value() This function is effectively broken. Thus, I do not see what benefit we get from an impact analysis. > * Deprecate md5(), sha1(), md5_file(), and sha1_file() (add an actual > analysis, not just a statement as this is a high impact proposal) To circle back to the beginning, what does a detailed analysis brings us here? Tim is aware this has potential to impact a lot of code, which is why it is *explicitly* not being slated for removal in 9.0, and would require a follow-up RFC to remove it. Moreover, this is in the same vein as when we deprecated utf8_decode() and utf8_encode() in PHP 8.2: https://wiki.php.net/rfc/remove_utf8_decode_and_utf8_encode Tim slightly adjusted the wording of the RFC to make it clearer that the suggested replacements are only intended for users that are locked into the algorithm choice. I struggle to see a good reason to use MD5 in 2024, and I would hope that no-one uses MD5 to hash passwords in 2024, but somehow I doubt that. I'm also trusting Tim to implement the deprecation message, and the changelog entry on the manual, in a way that prompts users to re-evaluate their choice of algorithm rather then blindly using the hash extension with MD5/SHA1. But I did ask Damien, and he has told me that for each function there are that many projects that use the function at least once. I don't have any idea if it stems from a library, if they only used it once or 10 000 times in the project, nor for what purpose. md5: 862 sha1: 495 sha1_file : 85 md5_file : 245 > * Deprecate passing E_USER_ERROR to trigger_error() This is to limit usage and access to the bailout mechanism, and better alternatives exist and should be used. This is prime example of deprecations being the correct tool. > * Deprecate SOAP_FUNCTIONS_ALL constant and passing it to > SoapServer::addFunction() To me, this is a security issue first and foremost, and therefore we should discourage its use and remove it. However, once again, I've asked Damien to run a quick analysis and 182 projects use it, mainly Symfony and Drupal. > And to a lesser degree for: > * Formally deprecate Soft-deprecated DOMDocument and DOMEntity properties We are following the DOM Spec here. Thus I don't see how an impact analysis is useful. > * Deprecate SplFixedArray::__wakeup() This never worked properly. Thus I don't see how an impact analysis is useful. > * Deprecate passing null and false to dba_key_split() This also never worked properly and is a bug. Thus I don't see how an impact analysis is useful. > * Deprecate passing incorrect data types for options to ext/hash functions This is potential security issue, and only possible to know at runtime, so regardless of impact this should be removed. However, Niels did add such an analysis to the RFC. > * Constants SUNFUNCS_RET_STRING, SUNFUNCS_RET_DOUBLE, SUNFUNCS_RET_TIMESTAMP This is a follow-up on a deprecation enacted in PHP 8.1, and arguably should have been done at the same time, cf. https://wiki.php.net/rfc/deprecations_php_8_1#date_sunrise_and_date_sunset > * Remove E_STRICT error level and deprecate E_STRICT constant This error level had only 2 strange uses in PHP 7, and has been completely removed in PHP 8. I don't see what benefit an impact analysis would bring here, we are just deprecating/removing cruft at this point. > * mysqli_ping() and mysqli::ping() This is broken as of PHP 8.2. Thus I don't see how an impact analysis is useful. > P.S.: typo in "xml_set_object() and xml_set_*_handler() with string method > names": "witch" => "which" Fixed >>> Other than that, I join the previously voiced objections to the deprecation >>> of `uniqid()`, `md5()`, `sha1()`, `md5_file()`, `sha1_file()`. >>> While I acknowledge that these functions _can_ be used inappropriately for >>> security-sensitive code, which should use alternative methods, these >>> functions have perfectly valid use-cases for non-security-sensitive code >>> and the impact of the BC-break of deprecating and eventually removing these >>> methods can, IMO, not be justified. >>> Keep in mind that while "we" know and understand that deprecations are not >>> errors, end-users often don't and particularly for open source projects, >>> this means that in practice these deprecations will need to be addressed >>> anyway to reduce the noise of users opening issues about them, which >>> without a clear path to removal of the functions, will, in a lot of cases, >>> mean adding the `@` operator to all uses. >> >> If I may be a bit cheeky, if we consider that userland does not understand >> that deprecations are not errors, how can we trust them to use the 5 >> aforementioned functions correctly? >> Especially as there are more appropriate replacements available. > > There is a difference between "userland" (dev-users) and end-users. I was > talking about end-users, while based on your remark, you are talking about > dev-users. I am unsure what you mean by "end-users" here, I am going to assume you mean PHP developers that write PHP code using PHP libraries and/or frameworks. Because if you refer to "end-users" as people that install WordPress (or whatever PHP application) via something like CPanel, this is a totally different conversation. I sincerely appreciate that you are very much a tooling and library ecosystem developer, but from a core developer PoV that someone is an "end-user" or a "dev-user" is not a practical distinction. We cannot make a function available only to "dev-users" that know what they do, we need to consider the whole userbase, and arguably end-users are the largest proportion of this. Thus if end-users do not understand that deprecations are not errors, how should I expect end-users to read the documentation? A deprecation is loud and clear, the documentation can be easily ignored, and there is no way to verify if the aforementioned functions are used correctly. I will add, I very much dislike the argument "but it is clearly explained on the documentation, so it is not a problem". I do not know about most people, but frankly, I do not look up the documentation everytime I want to use a function, similarly to me not having consulted a dictionary to verify the meaning of the words I'm using before typing them. In the same way that human languages will "deprecate" words by discouraging their usage, we ought to do the same for the language we write our code in. If the issue is that you, as a maintainer, don't have a page on php.net to point to users where we clearly say "Promoting deprecations to Exceptions is wrong and bad" then we can make such a page. Tim made a PR, which is now live [2], to the ErrorException page changing the example to a correct error handler promoting warnings and notices to exceptions but not deprecations. However, if even creating such a page is not enough, then maybe we need to do some engine level changes where we properly split out deprecations from the other diagnostics being emitted, so that it becomes *impossible* for people to promote deprecations to exceptions (which somehow I'm thinking that people like Marco and Nicolas would appreciate). These are all conversations that we can have. However, "stop deprecating things" is not a "solution" to people not understanding what deprecations are. This has come up again and again, and the answer has been constant, it is unacceptable to tell the project to not deprecate things. It is one of our limited tools to actually make changes to the language, removing this is not an option. Because if we do remove this option, I can definitely see people starting to create their own flavours of PHP to fix stuff that apparently must be set in stone in the official language. Which I don't think many people want to see this. And, considering that most of the deprecations are in *extensions* this is, yet again, reinforcing my opinion that we should unbundle all extensions so that they can move at their own pace and that users can install whatever version of said extension they want. Moreover, if you permit me to do an aside from another industry. The construction industry in Europe is going through a massive overhaul via the second generation of Eurocodes. [3] All of the final drafts were submitted prior to October of 2023, and will be finalized, translated into German/French, and voted on prior to 30 March 2026. These new standards will then be implemented at a national level by all 34 members of CEN [4] via their national standard body (e.g. the British Standard Institute for the UK) by 30 September 2027. Finally, the previous versions of the standard *must* be withdrawn by 30 March 2028. Therefore, if the final version of a standard is published at the latest possible time, there is at most a two year transition period for a large part of an *industry* and, at minimum, 34 national standardization bodies to adopt the new standard, deprecate and withdraw the old one. It is possible to use the new Eurocodes prior to them becoming mandated at the national level (which the member countries of CEN are obligated to do), but once that step is taken using the previous Eurocodes will require a legal exemption. Meanwhile, PHP, a programming language that represents a fraction of the software industry, and does not need to deal with any legally binding system whatsoever, provides longer time frames, and yet this is not enough? There is no legal requirement for any project to need to use the latest version of PHP, and if you fancy it, you could create a new project written in PHP 5.2 today if you wanted. Maybe PHP and its users aren't able to cause the same level of damage as a bridge collapsing by letting potential security problems go unresolved, but that doesn't mean we can't learn a lesson from "real engineering" that benefits the project. > I also don't agree that there are "more appropriate replacements available". > The suggested `hash()` replacements for the md5/sha1* functions have the > exact same functionality, which the RFC considers "incorrect use", so what > are we actually solving by this deprecation ? Devs not having enough to do > already ? > The problem (for open source) with "force-replacing" the uses of `md5/sha1*` > functions with the `hash` function calls, is that the hash extension was not > part of PHP core until PHP 7.4, which means that for a significant number of > open source projects, the replacement is not a one-on-one function call > replacement, but needs guard code for PHP < 7.4 in case the hash extension is > not available. Reiterating what I said previously, replacing it with the one-to-one equivalent should only be done if you truly need those specific algorithms. Otherwise its usage should be reconsidered depending on the requirements and switched to something "safer". Hopefully this is clearer now that Tim amended the RFC. I can understand that userland projects and end-users work on a broad range of versions, but it is unreasonable to expect the PHP project to not do something because the situation used to be different over 5 years ago. Moreover, according to Tim, who used to work on a PHP application that people would install on shared-hosting, most hosting companies actually have optional extensions enabled such as ext/hash, ext/intl, ext/mbstring, or even ext/gmp. So with all due respect, I do not think that ext/hash not being mandatory prior to PHP 7.4 is a good counter argument. Especially, as according to Packagist statistics, 93% of users use a version of PHP that is PHP 7.4 or above, and this percentage is only going to increase. [4] > Also, having read through the RFC a second time, I find the voting choices > inconsistent - in particular the first deprecation vote, which makes the > others ambiguous. > Could each voting choice please be explicitly one of the below to prevent any > confusion ? > * Remove in PHP 8.4 > * Deprecate in PHP 8.4 and remove in PHP 9 > * Deprecate in PHP 8.4 and remove at a later date after a separate vote Unless specified otherwise, it is deprecate in 8.4 and remove in PHP 9, the other ones which specify it is for process efficiency. Best regards, Gina P. Banyard [1] https://www.exakat.io/en/ [2] https://www.php.net/manual/en/class.errorexception.php [3] https://eurocodes.jrc.ec.europa.eu/second-generation-eurocodes [4] https://standards.cencenelec.eu/dyn/www/f?p=CEN:5 [5] https://stitcher.io/blog/php-version-stats-july-2024