On Sun, 13 Jun 2021 at 07:46, Mike Schinkel <m...@newclarity.net> wrote:
> Nice! There is an awful lot to like here. > Thank you. > 1. [...] Minimally I'd ask for a separate vote on is_literal() vs. > is_from_literal(). > It would definitely be possible to have a vote on the name. I only went with is_literal() because it's shorter (I'm a slow typer), and to match the current convention of existing "is_*" functions (which all use a single word suffix). 2. Regarding the sections "Non-Parameterised Values" and "Future Scope," if > I understand correctly, your vision is to ultimately require all SQL and > other vulnerable literals to be written as literals in the source code for > them to be used or otherwise disallowed by future library code and/or PHP > internal functionality? > No, fortunately that's not the case, as you're right, we do need to support PhpMyAdmin, Adminer, systems like Drupal (which take certain values, like the table name, from user input), and (as noted in your 4th point), there is a lot of existing code that would break if we were militant about requiring is_literal() for mysqli_query() and other functions. Native functions like mysqli_query() should be covered in a future RFC, but still only use warnings. The exact details would need to be confirmed, but should provide a simple way to mark certain non-literal strings as trusted, and still have a way to switch off those warnings entirely (e.g. legacy systems). Ideally as future scope, we would also provide a way for certain projects to opt-in to having exceptions raised (the militant option), which would not be appropriate for the vast majority of developers, but it's the level I'd find useful for some of my projects (dealing with medical records and other sensitive information). As to how we could (in a future RFC) mark certain non-literal strings as trusted, I've briefly touched on how this can work in the "Future Scope" section (it's an idea I copied from Trusted Types in JavaScript, which follows the same principle). I only did a summary because the exact implementation would need to be discussed later (there are a few options, but they all rely on is_literal() being implemented first). First, we need libraries to use is_literal() to check inputs from developers (i.e. the code they didn't write), which is their main source of injection vulnerabilities. What goes on inside the library is entirely up to them (they know that they are doing). For their output, they create a stringable value-object, which (as future scope) could be marked as trusted for certain functions (like mysqli_query). These value-objects could be as simple as a single private property (value), a __construct() method that takes the libraries output, and a __toString() method to return it; or the library might want to be extra sure of its output, and use something like the Query Builder example linked in the "Non-Literal Values" section. As future scope, this allows functions like mysqli_query() to simply check for a literal, or one of these trusted value-objects (or have warnings switched off entirely). 3. I notice your RFC does not grant sprintf() the ability to return a > string with the "literal" flag even though sprintf() can frequently return > strings composed from known literals and safe numeric value. > Joe has created a `literal_sprintf()` function for testing, I think it does exactly what you're suggesting, and I believe that logic can be applied directly to sprintf(). The only reason I've been hesitant is to keep this implementation of is_literal() limited to identifying literals and supporting concatenation (to keep it simple, and easy to use). It's like the "String Splitting" part in the RFC, it's probably fine, and we can do it if there's interest; but we also need to consider what issues it might cause (if any). From a security point of view, it's always best to keep something as simple as possible (and that may well include supporting sprintf), because "You cannot prove security. You can only prove insecurity". > 4. Regarding the section "WHERE IN" you argue that we should push everyone > to use parameters properly — which is a great idea in theory — but that > would have a significant breakage in existing sites online. > Hopefully my answer to #2 addresses this, as you're right, that would definitely cause more problems than it solves. Certainly for this implementation I support libraries providing warnings and think those would be the most appropriate, rather than exceptions. (Though obviously libraries know their user bases best of course). In summary of my future scope answer above, libraries could be able to mark their output as trusted for functions like mysqli_query(). For SQL it would ideally still be a literal string, but that's not always possible (especially with HTML), so that's where the trusted stringable value-objects would be used. For legacy systems, developers could simply switch off the warnings entirely (but at least that's a choice they are actively choosing to make). > 5. Given #2 and #4, I think your section "Support Functions" is missing > [...] as_literal() or make_literal() or even as_unsafe_literal() > I'm not against it, but I don't think it's needed (and with it raising warnings not exceptions it won’t be necessary to prevent things breaking). In the Example Library I made under the "Try It" section in the RFC: https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4 It checks for literals but also accepts a value-object allowing you to bypass that check (which I think is probably the best way to implement this at the moment). It shows how an `unsafe_value` value-object could be used and accepted by the example library, in the same way DB::raw() works in Laravel. In the future a similar setup in PHP’s native functions could accept literals or value-objects which can be trusted, so we shouldn’t need a way to lie to PHP and tell it to treat one as the other. 6. Regarding the section "Reflection API," shouldn't their be an > is_literal()/is_from_literal() method added to the ReflectionParameter() > and ReflectionProperty() classes? > If I can just check we are talking about the same thing, so where we have: class my_class { private string|int $my_property = 0; private function my_method(string|int $my_parameter) { } } $reflector = new ReflectionClass('my_class'); $property = $reflector->getProperty('my_property'); $method = $reflector->getMethod('my_method'); $parameter = $method->getParameters()[0]; var_dump( $property->getType()->__toString(), $parameter->getType()->__toString(), ); The ReflectionParameter/ReflectionProperty allows us to return their respective types (ReflectionType). In this case, it's saying these two are "string|int", which isn't the current value, just what they can accept. I think this is touching on an area that Joe mentioned yesterday, "a possible future where we have first class support" in a more intricate way. I think there is value in this, as it would allow PHP itself to reject non-literal values, but that should be a separate RFC, as that's a bit more of a language change, and will need its own discussion. So assuming you address at least #5 — and thus #2 and #4 indirectly — I > would really like to see this RFC pass. > > -Mike > Thank you Mike Craig