Hi Craig, > On Jun 12, 2021, at 1:00 PM, Craig Francis <cr...@craigfrancis.co.uk> wrote: > > Hi Internals, > > I'd like to start the discussion on the is_literal() RFC: > > https://wiki.php.net/rfc/is_literal
Nice! There is an awful lot to like here. And few bits of concern. What's to like? ------------------- 1. Adding a proactive method for guarding against injection vulnerabilities would be a huge step forward for PHP. This should not be underemphasized. 2. That you added the section "Previous Work" and covered what other languages are doing in this regard. 3. The "Thanks" section and all the external references and prior comments that provides insight into how this RFC was developed. 4. And I love that you publicly stated these points in an RFC: a. Why not use static analysis? It will never be used by most developers. b. Why not educate everyone? You can't - developer training simply does not scale, and mistakes still happen. What's of concern? ------------------------- (Note: the last point could address concerns #2 and #4.) 1. Bike-shedding a bit, but I find a bit of cognitive dissonance over the semantic incorrectness of is_literal(), similar to Jakob Givoni's concern. I also fear it will be confusing for developers who know what a literal is, and thus I agree with Jakob. Minimally I'd ask for a separate vote on is_literal() vs. is_from_literal(). 2. Regarding the sections "Non-Parameterised Values" and "Future Scope," if I understand correctly, your vision is to ultimately require all SQL and other vulnerable literals to be written as literals in the source code for them to be used or otherwise disallowed by future library code and/or PHP internal functionality? If I did not misunderstand I believe you would disallow an important class of functionality from being implemented in PHP, including several existing and widely used applications such as but not limited to PhpMyAdmin and Adminer that open and interact with arbitrary SQL databases. It would also disallow using any data returned by APIs where the PHP application cannot know in advance what data will be acquired, even if that data has been properly escaped and/or sanitized. 3. I notice your RFC does not grant sprintf() the ability to return a string with the "literal" flag even though sprintf() can frequently return strings composed from known literals and safe numeric value. And I have frequently see it used for this purpose, e.g.: $fields = ['id','name']; $table = 'my_table'; $sql = sprintf( 'SELECT %s FROM %s WHERE ID=%d', implode( ',', $fields ), $my_table, intval($_GET['id'])); Of course your section "WHERE IN" "addresses" this, but please see #4 below. I argue sprintf() should be able to pass thru literal flags and validate integer values to determine that the result is literal. Am I missing something here? 4. Regarding the section "WHERE IN" you argue that we should push everyone to use parameters properly — which is a great idea in theory — but that would have a significant breakage in existing sites online. While many here in Internals seem to look down on WordPress if does power over 40%[1] of the web at this point and this would break most if not all those sites if the site owners or their hosts were to upgrade to a version of PHP that was militant about requiring is_literal() for mysqli_query()[2]. And yes, WordPress could upgrade their core code[3] to address this, but that would not also fix the 60k plugins and 10k themes on WordPress.org <http://wordpress.org/>, or likely the million+ combined custom plugins and themes in use on the web. Is it really reasonable to change PHP to just break of all those sites so that we can force all the site owners to beg and plead with their plugin and theme developers to better protect them from injection errors? Seems that medicine might be worse than the cure, at least for all those site owners. 5. Given #2 and #4, I think your section "Support Functions" is missing what would allow WordPress, PhpMyAdmin, Adminer and all other PHP applications to address the concerns in #2 and #4 and it is neither literal_concat() nor literal_implode(). The function we would need is the one Lauri Kenttä wrote (in jest?), but we'd need one that is performant, not a userland version. Name it as_literal() or make_literal() or even as_unsafe_literal() — whichever chosen is unimportant IMO — but applications like the ones I mentioned need to be able to say "I absolutely know what I am explicitly doing so do not throw a warning or an error telling me my value is `not a literal`, dammit!" Certainly most people would not use such a method by accident, especially if called something like `as_unsafe_literal()`. There might even be ways to validate that this was not used by a copy-and-paste artist, but I will leave that validation to other people who are smarter than me. 6. Regarding the section "Reflection API," shouldn't their be an is_literal()/is_from_literal() method added to the ReflectionParameter() and ReflectionProperty() classes? So assuming you address at least #5 — and thus #2 and #4 indirectly — I would really like to see this RFC pass. -Mike [1] https://w3techs.com/blog/entry/40_percent_of_the_web_uses_wordpress <https://w3techs.com/blog/entry/40_percent_of_the_web_uses_wordpress> [2] https://www.php.net/manual/en/mysqli.query.php#refsect1-mysqli.query-parameters <https://www.php.net/manual/en/mysqli.query.php#refsect1-mysqli.query-parameters> [3] https://github.com/WordPress/WordPress/blob/master/wp-includes/wp-db.php#L2056 <https://github.com/WordPress/WordPress/blob/master/wp-includes/wp-db.php#L2056>