On 21 Apr 2022, at 15:09, Rowan Tommins <rowan.coll...@gmail.com> wrote: > On Wed, 20 Apr 2022 at 18:02, Craig Francis <cr...@craigfrancis.co.uk> wrote: >> I'm just trying to focus on how PHP has worked > > You keep repeating this mantra, but user-defined functions with declared > parameter types have never accepted nulls, in any mode, unless explicitly > marked with "?" or "= null".
Thanks Rowan, and you're right that user defined functions haven't worked like that, but that's not really what I'm focused on - which is how PHP has always worked in regards to internal functions (the bit that has changed, and is causing problems), and how NULL coercion works in other contexts. I mention user defined functions because I still want user and internal functions to work in the same way (one form of consistency). > The fact that internal functions have parameter parsing behaviour that is > almost impossible to implement in userland, and often not even consistent > between functions, is a wart of engine internals, not a design decision. Bit of a tangent, but do you have some examples? would be nice to clean some of these up, or at least have them in mind as we discuss this RFC. > All of that, and the "consistency" in the title of your RFC, is a complete > distraction from the real questions: > > 1) given a null input, and a non-nullable parameter, what should the run-time > do? > 2) what is the best way to help users update their code to be compatible with > that new behaviour? I'm happy to change the title, but consistency does apply in different ways... as in, ensuring user and internal function parameters work in the same way (even if that way might need to change slightly); how other simple values like string/int/float/bool values can still be coerced for parameters, but NULL won't be; and how NULL coercion does work in other contexts (string contact, `==` comparison, sprintf, print, array keys). To answer your questions, 1. I think NULL inputs should be coerced for non-nullable parameters, when not using `strict_types=1`, and where static analysis tools provide the additional/stricter checks (like how they can reject a string '5' for an integer parameter). With nullable parameters being different by accepting NULL without coercion. 2. With this approach, developers won't need to update their code... in contrast, the approach of NULL causing a Type Error for internal functions, developers will need to change their code, and I cannot find any tools that help with this, except very strict static analysis to find but not update them (it's difficult tracing every variable from source to sink). >> Agreed, the only thing I'd add... failing early with NULL might help debug >> some problems (assuming there is a problem), but I believe static analysis >> and unit tests are much better at doing this (e.g. static analysis is able >> to see if a variable can be NULL, and apply those stricter checks, as noted >> by George with Girgias RFC). > > > I think we should institute a "swear jar" - whenever someone says "static > analysis could do this better", they have to donate €1 to the PHP Foundation. > :P Cool, I've got 23 credits left this month :-) So I'll spend 1 more... I think it's fair to say that developers using `strict_types=1` are more likely to be using static analysis; and if `strict_types=1` is going to eventually disappear, those developers won't lose any functionality with the stricter checking being done by static analysis, which can check all possible variable types (more reliable than runtime), and (with the appropriate level of strictness) static analysis can do things like rejecting the string '5' being passed to an integer parameter and null being passed to a non-nullable parameter. > More seriously, 1) your own RFC claims that fewer than 33% of developers use > static analysis; and 2) if PHP had a compile step with mandatory static > analysis, we would still have to decide whether passing NULL to these > functions should be rejected as an error by the static analyser. Not sure how to respond to that... do you think PHP will have a compile step, with mandatory static analysis? Considering how static analysis tools today have several levels, and the ability to create a baseline (to ignore problems with old code), are you suggesting that everyone would have the strictest checks enabled, where no coercion happens at all? >> In contrast, failing early at runtime, on something that is not actually a >> problem, like the examples in my RFC, creates 2 problems (primarily >> upgrading; but also adding unnecessary code to explicitly cast those >> possible NULL values to the correct type, which isn't needed for the other >> scalar types). > > > I've been trying not to nit-pick, because I think over-all you do have a > valid point, but several of your examples do not need any extra code to > handle nulls; and those that do need maybe a handful of characters. For > instance, $search = ($_GET['q'] ?? ''); is both shorter and clearer than > $search = ($_GET['q'] ?? NULL); Thank you; and you're right, if you write new code today, you could do that, but that assumes you don't need to tell the difference between an empty value vs a missing value (I find this check is rare, but they do happen, and is why most frameworks use NULL for that distinction, WordPress being the only exception I've found so far). But, updating existing code, while that would make automated updates easier, it's likely to cause problems, because you're editing the value source, with no idea about checks later on (like your example which looks for NULL)... and that's why an automated update of existing code would have more luck updating the sinks rather than the sources (e.g. it knows which sinks are expecting a string, so it can add in a `strval($var)`, or `(string) $var`, or `$var ?? ""`). >> Thank you... but I will add, while `htmlspecialchars()` rejecting NULL would >> get you to look at the code again, I wouldn't say it's directly picking up >> the problem, and it relies on there being a NULL value in $fields for this >> to be noticed (if that doesen't happen, you're now in a position where >> random errors will start happening in production). > > > There are always going to be errors that depend on the value of input, and > unless you have god-like skill at writing tests, some of those will only > happen in production. Saying "this shouldn't happen" doesn't answer the > question of what to do when it does. Agreed, it's just that in the example given, runtime checks seemed particularly fragile (as in, the NULL check is a feature to affect the output, and while I don't expect all code to be tested, I would expect explicit features to be, which would ensure a NULL value is provided and checked, instead of relying on the developer remembering to test a NULL value). >> Bit of a tangent, I'm uncomfortable that `$name` is not being HTML encoded, >> which takes us to context aware templating engines, and how you can identify >> these mistakes via the `is_literal` RFC or the `literal-string` type. > > > This isn't real code, it's an illustrative example; but if it makes you feel > more comfortable, imagine $name has some deliberate HTML markup in it, so > needs to be echo'd raw. (/me shudders, with memories of a project that did that, and trying to remember which variables contained HTML). >> That said, if you (or anyone) have any better ideas on how to address this >> problem, please let me know. > > I honestly would be more interested in having some string functions return > null for null input than changing existing behaviour to accept null for > non-nullable parameters (interestingly, until PHP 8.0, htmlspecialchars() > could return null, e.g. if given an array). Unfortunately, that would be a > different kind of compatibility break, so I'm not sure it fully solves the > problem. I've talked to a few developers who see NULL being passed to htmlspecialchars() as something they want to be warned about, because they see NULL as an invalid value (with my RFC, they would still keep that check with `strict_types=1` and/or static analysis). Also, have a look at the quiz results: https://quiz.craigfrancis.co.uk/ It kinda confirms those discussions, while I don't think many people carefully went though the list of parameters (a bit more on the quantitative rather than qualitative side), if you focus on the people who selected option 3 first (preferring a Fatal Error for NULL), it's only Kamil and Tim who were ok with htmlspecialchars() accepting NULL, but they were not open to many of the other 334 parameters. That said, your suggestion of "?|>" could work in addition to my RFC, so the NULL value can be kept when explicitly asked to preserve (avoiding that backwards compatibility issue). Craig -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php