Re: [PHP-DEV] [RFC] is_literal

Craig Francis Sun, 13 Jun 2021 09:51:56 -0700

On Sun, 13 Jun 2021 at 07:46, Mike Schinkel <m...@newclarity.net> wrote:


> Nice!  There is an awful lot to like here.
>


Thank you.



> 1. [...] Minimally I'd ask for a separate vote on is_literal() vs.
> is_from_literal().
>


It would definitely be possible to have a vote on the name. I only went
with is_literal() because it's shorter (I'm a slow typer), and to match the
current convention of existing "is_*" functions (which all use a single
word suffix).


2. Regarding the sections "Non-Parameterised Values" and "Future Scope," if
> I understand correctly, your vision is to ultimately require all SQL and
> other vulnerable literals to be written as literals in the source code for
> them to be used or otherwise disallowed by future library code and/or PHP
> internal functionality?
>


No, fortunately that's not the case, as you're right, we do need to support
PhpMyAdmin, Adminer, systems like Drupal (which take certain values, like
the table name, from user input), and (as noted in your 4th point), there
is a lot of existing code that would break if we were militant about
requiring is_literal() for mysqli_query() and other functions.

Native functions like mysqli_query() should be covered in a future RFC, but
still only use warnings. The exact details would need to be confirmed, but
should provide a simple way to mark certain non-literal strings as trusted,
and still have a way to switch off those warnings entirely (e.g. legacy
systems).

Ideally as future scope, we would also provide a way for certain projects
to opt-in to having exceptions raised (the militant option), which would
not be appropriate for the vast majority of developers, but it's the level
I'd find useful for some of my projects (dealing with medical records and
other sensitive information).

As to how we could (in a future RFC) mark certain non-literal strings as
trusted, I've briefly touched on how this can work in the "Future Scope"
section (it's an idea I copied from Trusted Types in JavaScript, which
follows the same principle). I only did a summary because the exact
implementation would need to be discussed later (there are a few options,
but they all rely on is_literal() being implemented first).

First, we need libraries to use is_literal() to check inputs from
developers (i.e. the code they didn't write), which is their main source of
injection vulnerabilities. What goes on inside the library is entirely up
to them (they know that they are doing). For their output, they create a
stringable value-object, which (as future scope) could be marked as trusted
for certain functions (like mysqli_query).

These value-objects could be as simple as a single private property
(value), a __construct() method that takes the libraries output, and a
__toString() method to return it; or the library might want to be extra
sure of its output, and use something like the Query Builder example linked
in the "Non-Literal Values" section.

As future scope, this allows functions like mysqli_query() to simply check
for a literal, or one of these trusted value-objects (or have warnings
switched off entirely).


3. I notice your RFC does not grant sprintf() the ability to return a
> string with the "literal" flag even though sprintf() can frequently return
> strings composed from known literals and safe numeric value.
>


Joe has created a `literal_sprintf()` function for testing, I think it does
exactly what you're suggesting, and I believe that logic can be applied
directly to sprintf().

The only reason I've been hesitant is to keep this implementation of
is_literal() limited to identifying literals and supporting concatenation
(to keep it simple, and easy to use).

It's like the "String Splitting" part in the RFC, it's probably fine, and
we can do it if there's interest; but we also need to consider what issues
it might cause (if any). From a security point of view, it's always best to
keep something as simple as possible (and that may well include supporting
sprintf), because "You cannot prove security. You can only prove
insecurity".



> 4. Regarding the section "WHERE IN" you argue that we should push everyone
> to use parameters properly — which is a great idea in theory — but that
> would have a significant breakage in existing sites online.
>


Hopefully my answer to #2 addresses this, as you're right, that would
definitely cause more problems than it solves. Certainly for this
implementation I support libraries providing warnings and think those would
be the most appropriate, rather than exceptions. (Though obviously
libraries know their user bases best of course).

In summary of my future scope answer above, libraries could be able to mark
their output as trusted for functions like mysqli_query(). For SQL it would
ideally still be a literal string, but that's not always possible
(especially with HTML), so that's where the trusted stringable
value-objects would be used. For legacy systems, developers could simply
switch off the warnings entirely (but at least that's a choice they are
actively choosing to make).



> 5. Given #2 and #4, I think your section "Support Functions" is missing
> [...] as_literal() or make_literal() or even as_unsafe_literal()
>


I'm not against it, but I don't think it's needed (and with it raising
warnings not exceptions it won’t be necessary to prevent things breaking).

In the Example Library I made under the "Try It" section in the RFC:

https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4

It checks for literals but also accepts a value-object allowing you to
bypass that check (which I think is probably the best way to implement this
at the moment). It shows how an `unsafe_value` value-object could be used
and accepted by the example library, in the same way DB::raw() works in
Laravel. In the future a similar setup in PHP’s native functions could
accept literals or value-objects which can be trusted, so we shouldn’t need
a way to lie to PHP and tell it to treat one as the other.


6. Regarding the section "Reflection API," shouldn't their be an
> is_literal()/is_from_literal() method added to the ReflectionParameter()
> and ReflectionProperty() classes?
>


If I can just check we are talking about the same thing, so where we have:

  class my_class {
    private string|int $my_property = 0;
    private function my_method(string|int $my_parameter) {
    }
  }

  $reflector = new ReflectionClass('my_class');
  $property = $reflector->getProperty('my_property');
  $method = $reflector->getMethod('my_method');
  $parameter = $method->getParameters()[0];

  var_dump(
      $property->getType()->__toString(),
      $parameter->getType()->__toString(),
    );

The ReflectionParameter/ReflectionProperty allows us to return their
respective types (ReflectionType). In this case, it's saying these two are
"string|int", which isn't the current value, just what they can accept.

I think this is touching on an area that Joe mentioned yesterday, "a
possible future where we have first class support" in a more intricate way.
I think there is value in this, as it would allow PHP itself to reject
non-literal values, but that should be a separate RFC, as that's a bit more
of a language change, and will need its own discussion.


So assuming you address at least #5 — and thus #2 and #4 indirectly — I
> would really like to see this RFC pass.
>
> -Mike
>


Thank you Mike

Craig

Re: [PHP-DEV] [RFC] is_literal

Reply via email to