Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

Kamil Tekiela Tue, 22 Jun 2021 09:54:19 -0700

While it's true that a lot of the internet is using mysqli due to
WordPress, this doesn't change the fact that PHP already offers a solution
to the problem. Both PDO and mysqli support server-side prepared
statements.


We don't talk about WordPress. They should not hold back PHP. That project
is so behind that they still offer mysql_* API in their abstraction
library. Nonetheless, whatever solution we offer, they're unlikely to pick
it up. They haven't used prepared statements for whatever reason, and they
created their own client-side prepared statements. What makes you think
that if we write "Parser & Sanitizer", they're going to start using it? The
proposal for `is_literal` is going to be completely useless to WordPress as
they have no place to use it in their codebase. It's a pity we can't help
them, but it's true.

To clarify the proposal from Craig, I would like to point out that
`is_literal()` will not help with the data being passed to SQL. User input
cannot be trusted in any sense, no matter where it comes from. The proposal
for `is_literal()` helps libraries that try to build the SQL dynamically,
without the user input. For example, it would help to write a secure method
like this:

function select(mysqli $mysqli, string $table) {
if(!is_literal($table)) {
throw new RuntimeException('The table name has to be a hardcoded string');
}

return $mysqli->query("SELECT * FROM {$table}")->fetch_all();
}

This code would prevent the developer from misusing this function by
passing user input to it.

>That is why the title of this email said "Parse and Sanitizer" not
"Builder."

Then we definitely don't need that. Both PDO and mysqli can escape string
literals in a context aware manner (to some extent at least). PDO tried to
implement a parser, but it proved challenging and to this day doesn't work
correctly. As Larry explained, SQL is not a standardised language anymore.
The flavours diverged and created a widely inconsistent language.

To build a context-aware parser for SQL, you would need to know:

- The SQL flavour
- The database type (MySQL or MariaDB)
- The database version
- The settings maintained on that database server

For example, in MySQL, you need to know which character is used to escape
quotes in string literals, and you need to know the SQL_MODE. Then you need
to know which syntax is allowed and understood by that specific MySQL
version. The SQL syntax differs a lot between MySQL and MariaDB. Building,
parsing and compiling the SQL in PHP would involve rebuilding the same
parser that MySQL server is using. And you'd have to do it for every single
possible database vendor/version. On top of that, you need to have direct
access to the database/table/column/SP/functions names, so that your parser
can apply the business logic. Not to mention escaping the data if you don't
want to use prepared statements.

>OTOH, what about when a developer needs to parameterize field and table
names in PDO

You don't parameterize field and table names. Period. You hardcode these
names in your business logic. Here's an example from DBAL:

$queryBuilder
    ->select('id', 'name') // <-- Hardcoded
    ->from('users') // <-- Hardcoded
    ->where('email = ?') // <-- Hardcoded
    ->setParameter(0, $userInputEmail) // <-- Parameterized
;

Irrelevant of which parameterization technique is used here, the idea is
that you never parameterize SQL code. You parameterize the data. WordPress
does exactly the same with their prepare() method. If you need to have
dynamic field identifiers, then it's your job as a developer to write the
code that can safely handle such requirement.

A native PHP "SQL Object Model with parser and sanitizer functionality"
wouldn't prevent SQL injection in the same way that userland DB abstraction
libraries don't. You can still create SQL injection with wordpress $wpdp
class, you can do it with DBAL, and you can do it with PDO.

To summarize, we are not looking for a solution to prevent SQL injection or
to help Wordpress. We are looking for a solution that would prevent the
misuse of libraries like DBAL, so that methods like select(), from(), and
where() could generate a warning when used with non-hardcoded data.

Regards,
Kamil

P.S. By "sanitization" I understand removing unwanted information from a
piece of data. SQL injection cannot be prevented by removing data. It can
only be prevented by strict separation of SQL and data, or by careful
formatting of the data when included in the SQL string itself. This
includes quoting, type casting and escaping of quotes. Sanitization cannot
prevent SQL injection!

Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

Reply via email to