On Monday, 8 July 2024 at 04:04, Juliette Reinders Folmer 
<php-internals_nos...@adviesenzo.nl> wrote:
> On 2-7-2024 20:05, Gina P. Banyard wrote:
>> On Tuesday, 2 July 2024 at 10:52, Juliette Reinders Folmer 
>> <php-internals_nos...@adviesenzo.nl> wrote:
>>
>>> * While a number of proposals include an impact analysis (thank you!), a 
>>> significant number of the proposals don't.
>>> It would be appreciated if for those proposals which aren't removing 
>>> unused/unusable functionality, some sort of impact analysis was added.
>>
>> You will need to clarify which ones you are talking about.
>> These "bulk removal" RFCs are written by various authors over the course of 
>> a year, and might not have been looked at for 9+ months.
>
> I'd suggest for an impact analysis/expected impact statement to be added to 
> the following deprecation proposals:

I am going to start this reply with the following:
An impact analysis showing a large impact to userland, is not in itself, an 
argument against a deprecation.
What an impact analysis helps to determine is the length of the deprecation and 
the timeline for removal.

It is getting exhausting to need to provide this, when what it is, is me asking 
Damien to check usage on the corpus of over 3100 projects, some open source 
(such as Wordpress, Drupal, OSS accounting software, etc.), top 1000 composer 
packages, and the private codebases he has access via his company, using his 
Exakat static analysis tool. [1]
The corpus is 160 MLOC (1.2 Billion tokens), 1.4 M files and as already 
mentioned over 3100 distinct projets.

But his tool will sometimes report duplicates, and has outdated versions which 
might not be affected by the issues anymore.
One reason is that some projects inline composer dependencies, and unless I do 
a painstaking manual review I cannot narrow this down.
Especially as it takes time to run the analysis on the corpus, and if I don't 
ask the precise question I don't get all the relevant stats.

So every stat is a conservative approximation.

We don't decide to deprecate and remove things for the fun of it.
But if something is misleading, badly designed, dangerous, has a security risk, 
or causes issues it should be deprecated.
It is my belief that it does not matter if this affects 10, 10 000, or 10 000 
000 codebases.
However, how and when we remove this, yes this is affected by the usage.

> * session.sid_length and session.sid_bits_per_character

Auditing INI setting usages is effectively impossible with Exakat.

Misusing these settings can lead to security issues,
and the new values will match the existing defaults.

I would guess that the majority of users don't even know about this setting and 
thus are not affected.
Similarly, it seems likely that application developers are also not aware of it,
causing applications to break if a hosting provider would adjust these settings.
For example: if the application expects it to be a specific format, which is 
defined in the database schema.

Considering the above, these INI settings should be removed and deprecated, 
regardless of impact.

> * xml_set_object() and xml_set_*_handler() with string method names

This behaviour is unintuitive and breaks all usual language semantics.
This should be deprecated and removed regardless of impact.
But when I was working on this I had asked Damien to run some analysis with 
Exakat and found 66 projects.
To which I have sent PRs to some to remove the usage, which is extemely simple 
to do.

To clarify the rationale, the following code is ambiguous:
`xml_set_element_handler($p, 'strrev')`

It either calls the \strrev() string function, or a method called `strrev` on 
the object provided by a call to `xml_set_object()`.
This is going to be the logic as of PHP 8.4 after some refactoring I did last 
October.
In the current released versions it is even more ambiguous, as the object 
provided by xml_set_object() could be passed *after* setting the string 
callable.

This behaviour is totally unintuitive, so regardless of the impact it should be 
removed.

> * Deprecate proprietary CSV escaping mechanism

This is a follow-up on an RFC whose first step was implemented in PHP 7.4. 
(https://wiki.php.net/rfc/kill-csv-escaping)
The first step was implemented (https://github.com/php/php-src/pull/3515) 
without a vote being held following the discusion on internals: 
https://externals.io/message/103268

This routinely bites people, and we still get issues about people being 
confused about this parameter.
We really should address this, and not wait yet another 5 years for complaints 
to once again be raised before we take any action.

> * Deprecate strtok() function

Symfony agrees with the rationale provided by the RFC and has banned the 
function from their project: https://github.com/symfony/symfony/issues/57542
This seems to indicate that the rationale around it is sound.
But just for the sake of it, I asked Damien, and I don't have the total number 
of usages, just that at least one call to strtok is made in 274 projects.
I have no idea which projects, and whether the majority of them are in a bunch 
of libraries or not.

> * Deprecate returning non-strings values from a user output handler
> * Deprecate producing output in a user output handler

This is hard to analyse as it depends on runtime execution.
However, the current behaviour when doing one of these things is questionable 
and/or broken.
And I firmly believe this should be deprecated/changed regardless of impact.

> * Deprecate mysqli_refresh()
> * Deprecate mysqli_kill()

These are following upstream deprecations from MySQL.

> * Deprecate lcg_value()

This function is effectively broken.
Thus, I do not see what benefit we get from an impact analysis.

> * Deprecate md5(), sha1(), md5_file(), and sha1_file() (add an actual 
> analysis, not just a statement as this is a high impact proposal)

To circle back to the beginning, what does a detailed analysis brings us here?
Tim is aware this has potential to impact a lot of code, which is why it is 
*explicitly* not being slated for removal in 9.0, and would require a follow-up 
RFC to remove it.

Moreover, this is in the same vein as when we deprecated utf8_decode() and 
utf8_encode() in PHP 8.2:
https://wiki.php.net/rfc/remove_utf8_decode_and_utf8_encode

Tim slightly adjusted the wording of the RFC to make it clearer that the 
suggested replacements are only intended for users that are locked into the 
algorithm choice.
I struggle to see a good reason to use MD5 in 2024, and I would hope that 
no-one uses MD5 to hash passwords in 2024, but somehow I doubt that.

I'm also trusting Tim to implement the deprecation message, and the changelog 
entry on the manual, in a way that prompts users to re-evaluate their choice of 
algorithm rather then blindly using the hash extension with MD5/SHA1.

But I did ask Damien, and he has told me that for each function there are that 
many projects that use the function at least once.
I don't have any idea if it stems from a library, if they only used it once or 
10 000 times in the project, nor for what purpose.

md5: 862
sha1: 495
sha1_file : 85
md5_file : 245

> * Deprecate passing E_USER_ERROR to trigger_error()

This is to limit usage and access to the bailout mechanism, and better 
alternatives exist and should be used.
This is prime example of deprecations being the correct tool.

> * Deprecate SOAP_FUNCTIONS_ALL constant and passing it to 
> SoapServer::addFunction()

To me, this is a security issue first and foremost, and therefore we should 
discourage its use and remove it.
However, once again, I've asked Damien to run a quick analysis and 182 projects 
use it, mainly Symfony and Drupal.

> And to a lesser degree for:
> * Formally deprecate Soft-deprecated DOMDocument and DOMEntity properties

We are following the DOM Spec here.
Thus I don't see how an impact analysis is useful.

> * Deprecate SplFixedArray::__wakeup()

This never worked properly.
Thus I don't see how an impact analysis is useful.

> * Deprecate passing null and false to dba_key_split()

This also never worked properly and is a bug.
Thus I don't see how an impact analysis is useful.

> * Deprecate passing incorrect data types for options to ext/hash functions

This is potential security issue, and only possible to know at runtime, so 
regardless of impact this should be removed.
However, Niels did add such an analysis to the RFC.

> * Constants SUNFUNCS_RET_STRING, SUNFUNCS_RET_DOUBLE, SUNFUNCS_RET_TIMESTAMP

This is a follow-up on a deprecation enacted in PHP 8.1, and arguably should 
have been done at the same time,
cf. https://wiki.php.net/rfc/deprecations_php_8_1#date_sunrise_and_date_sunset

> * Remove E_STRICT error level and deprecate E_STRICT constant

This error level had only 2 strange uses in PHP 7, and has been completely 
removed in PHP 8.
I don't see what benefit an impact analysis would bring here, we are just 
deprecating/removing cruft at this point.

> * mysqli_ping() and mysqli::ping()

This is broken as of PHP 8.2.
Thus I don't see how an impact analysis is useful.

> P.S.: typo in "xml_set_object() and xml_set_*_handler() with string method 
> names": "witch" => "which"

Fixed

>>> Other than that, I join the previously voiced objections to the deprecation 
>>> of `uniqid()`, `md5()`, `sha1()`, `md5_file()`, `sha1_file()`.
>>> While I acknowledge that these functions _can_ be used inappropriately for 
>>> security-sensitive code, which should use alternative methods, these 
>>> functions have perfectly valid use-cases for non-security-sensitive code 
>>> and the impact of the BC-break of deprecating and eventually removing these 
>>> methods can, IMO, not be justified.
>>> Keep in mind that while "we" know and understand that deprecations are not 
>>> errors, end-users often don't and particularly for open source projects, 
>>> this means that in practice these deprecations will need to be addressed 
>>> anyway to reduce the noise of users opening issues about them, which 
>>> without a clear path to removal of the functions, will, in a lot of cases, 
>>> mean adding the `@` operator to all uses.
>>
>> If I may be a bit cheeky, if we consider that userland does not understand 
>> that deprecations are not errors, how can we trust them to use the 5 
>> aforementioned functions correctly?
>> Especially as there are more appropriate replacements available.
>
> There is a difference between "userland" (dev-users) and end-users. I was 
> talking about end-users, while based on your remark, you are talking about 
> dev-users.

I am unsure what you mean by "end-users" here, I am going to assume you mean 
PHP developers that write PHP code using PHP libraries and/or frameworks.
Because if you refer to "end-users" as people that install WordPress (or 
whatever PHP application) via something like CPanel, this is a totally 
different conversation.

I sincerely appreciate that you are very much a tooling and library ecosystem 
developer, but from a core developer PoV that someone is an "end-user" or a 
"dev-user" is not a practical distinction.
We cannot make a function available only to "dev-users" that know what they do, 
we need to consider the whole userbase, and arguably end-users are the largest 
proportion of this.
Thus if end-users do not understand that deprecations are not errors, how 
should I expect end-users to read the documentation?
A deprecation is loud and clear, the documentation can be easily ignored, and 
there is no way to verify if the aforementioned functions are used correctly.

I will add, I very much dislike the argument "but it is clearly explained on 
the documentation, so it is not a problem".
I do not know about most people, but frankly, I do not look up the 
documentation everytime I want to use a function, similarly to me not having 
consulted a dictionary to verify the meaning of the words I'm using before 
typing them.
In the same way that human languages will "deprecate" words by discouraging 
their usage, we ought to do the same for the language we write our code in.

If the issue is that you, as a maintainer, don't have a page on php.net to 
point to users where we clearly say "Promoting deprecations to Exceptions is 
wrong and bad" then we can make such a page.
Tim made a PR, which is now live [2], to the ErrorException page changing the 
example to a correct error handler promoting warnings and notices to exceptions 
but not deprecations.

However, if even creating such a page is not enough, then maybe we need to do 
some engine level changes where we properly split out deprecations from the 
other diagnostics being emitted,
so that it becomes *impossible* for people to promote deprecations to 
exceptions (which somehow I'm thinking that people like Marco and Nicolas would 
appreciate).
These are all conversations that we can have.
However, "stop deprecating things" is not a "solution" to people not 
understanding what deprecations are.
This has come up again and again, and the answer has been constant, it is 
unacceptable to tell the project to not deprecate things.
It is one of our limited tools to actually make changes to the language, 
removing this is not an option.

Because if we do remove this option, I can definitely see people starting to 
create their own flavours of PHP to fix stuff that apparently must be set in 
stone in the official language.
Which I don't think many people want to see this.
And, considering that most of the deprecations are in *extensions* this is, yet 
again, reinforcing my opinion that we should unbundle all extensions so that 
they can move at their own pace and that users can install whatever version of 
said extension they want.

Moreover, if you permit me to do an aside from another industry.
The construction industry in Europe is going through a massive overhaul via the 
second generation of Eurocodes. [3]
All of the final drafts were submitted prior to October of 2023, and will be 
finalized, translated into German/French, and voted on prior to 30 March 2026.
These new standards will then be implemented at a national level by all 34 
members of CEN [4] via their national standard body (e.g. the British Standard 
Institute for the UK) by 30 September 2027.
Finally, the previous versions of the standard *must* be withdrawn by 30 March 
2028.
Therefore, if the final version of a standard is published at the latest 
possible time, there is at most a two year transition period for a large part 
of an *industry* and, at minimum, 34 national standardization bodies to adopt 
the new standard, deprecate and withdraw the old one.
It is possible to use the new Eurocodes prior to them becoming mandated at the 
national level (which the member countries of CEN are obligated to do), but 
once that step is taken using the previous Eurocodes will require a legal 
exemption.

Meanwhile, PHP, a programming language that represents a fraction of the 
software industry, and does not need to deal with any legally binding system 
whatsoever, provides longer time frames, and yet this is not enough?
There is no legal requirement for any project to need to use the latest version 
of PHP, and if you fancy it, you could create a new project written in PHP 5.2 
today if you wanted.
Maybe PHP and its users aren't able to cause the same level of damage as a 
bridge collapsing by letting potential security problems go unresolved, but 
that doesn't mean we can't learn a lesson from "real engineering" that benefits 
the project.

> I also don't agree that there are "more appropriate replacements available".
> The suggested `hash()` replacements for the md5/sha1* functions have the 
> exact same functionality, which the RFC considers "incorrect use", so what 
> are we actually solving by this deprecation ? Devs not having enough to do 
> already ?
> The problem (for open source) with "force-replacing" the uses of `md5/sha1*` 
> functions with the `hash` function calls, is that the hash extension was not 
> part of PHP core until PHP 7.4, which means that for a significant number of 
> open source projects, the replacement is not a one-on-one function call 
> replacement, but needs guard code for PHP < 7.4 in case the hash extension is 
> not available.

Reiterating what I said previously, replacing it with the one-to-one equivalent 
should only be done if you truly need those specific algorithms.
Otherwise its usage should be reconsidered depending on the requirements and 
switched to something "safer".
Hopefully this is clearer now that Tim amended the RFC.

I can understand that userland projects and end-users work on a broad range of 
versions, but it is unreasonable to expect the PHP project to not do something 
because the situation used to be different over 5 years ago.
Moreover, according to Tim, who used to work on a PHP application that people 
would install on shared-hosting, most hosting companies actually have optional 
extensions enabled such as ext/hash, ext/intl, ext/mbstring, or even ext/gmp.

So with all due respect, I do not think that ext/hash not being mandatory prior 
to PHP 7.4 is a good counter argument.
Especially, as according to Packagist statistics, 93% of users use a version of 
PHP that is PHP 7.4 or above, and this percentage is only going to increase. [4]

> Also, having read through the RFC a second time, I find the voting choices 
> inconsistent - in particular the first deprecation vote, which makes the 
> others ambiguous.
> Could each voting choice please be explicitly one of the below to prevent any 
> confusion ?
> * Remove in PHP 8.4
> * Deprecate in PHP 8.4 and remove in PHP 9
> * Deprecate in PHP 8.4 and remove at a later date after a separate vote

Unless specified otherwise, it is deprecate in 8.4 and remove in PHP 9, the 
other ones which specify it is for process efficiency.

Best regards,

Gina P. Banyard

[1] https://www.exakat.io/en/
[2] https://www.php.net/manual/en/class.errorexception.php
[3] https://eurocodes.jrc.ec.europa.eu/second-generation-eurocodes
[4] https://standards.cencenelec.eu/dyn/www/f?p=CEN:5
[5] https://stitcher.io/blog/php-version-stats-july-2024

Reply via email to