> On Jun 22, 2021, at 11:41 AM, Larry Garfield <la...@garfieldtech.com> wrote:
> 
> On Tue, Jun 22, 2021, at 8:39 AM, Mike Schinkel wrote:
>>> On Jun 22, 2021, at 9:00 AM, Kamil Tekiela <tekiela...@gmail.com> wrote:
>>> 
>>> Hi Mike,
>>> 
>>> Please don't do this. We already have PDO with prepared statements. The 
>>> data must be bound. This is the secure way of writing SQL queries. 
>> 
>> The problem is that over 40% of the web currently runs on PHP code that 
>> using mysqli.  That CMS does not support PDO nor prepared statements, 
>> and is unlikely to switch to it anytime some in the foreseeable future. 
> 
> WordPress is not going to leverage anything we do here until and unless there 
> is a major change of leadership and culture at that project.  Please don't 
> waste any mental effort on it; they clearly waste no mental effort on what 
> the rest of the PHP community considers good, secure practices.  Anything 
> involving them is tilting at windmills.

You misunderstand.  What I am (likely) "wasting" my mental effort is to discuss 
features that I would be able to use WITHOUT WordPress having to make ANY 
changes.

HOWEVER, these would be beneficial beyond WordPress, since parameterized 
queries cannot parameterize table names nor field names. 

This is all trying to address the concerns that Craig Francis brought up off 
list when he said that you cannot escape or sanitize without knowing context 
when I was asking his to provide a make_literal() function, add support for a 
IsLiteral attribute, or support an IsLiteralInterface so that people don't 
latch on to using is_literal() and make certain edge cases impossible. 

When I am trying to address a problem, I come at it from many angles until I 
find a solution.  One potential solution then is to have a class built into PHP 
that can ensure the SQL returned is indeed safe. 

> Mike, speaking as someone who has written an SQL abstraction layer and query 
> builder with significant usage (Drupal 7-9), you are *grossly* 
> under-estimating the complexity of what you describe.  It might be possible 
> to hack together for SQL92, aka "what most PHP devs actually use because they 
> haven't noticed that it's not 1992 anymore", but that's already been done.  
> We have DBTNG in Drupal, we have Doctrine, problem solved.

I think what is happening here is you are making an assumption I am proposing a 
much larger scope than I am.

Think of the scope I am proposing being on par with $mysqli->prepare(), but a 
bit more to be able to handle more than just values.

> Modern SQL, though, is a stupidly complex and stupidly inconsistent beast.  
> Most of the syntax beyond the basics is different on every damned database.  
> The official spec *is not even publicly available*, and costs a lot of money 
> to access.  And no DB engine actually supports all of it; they all support 
> different subsets with their own different extensions that may or may not be 
> comparable.

Both Modern SQL and legacy SQL are both still text-based query languages  and 
they all have grammars that can be represented by BNF rules. 

https://github.com/ronsavage/SQL

Those rules could be abstracted into a form accessible via a "dialect" 
interface and that is how these would literally any version of SQL could be 
supported.

Could we finish all of a given dialect at once?  No.  Iteration based on what 
is found to be supported is how this could be approached.  Remember, these 
dialects could be implemented in userland. By any PHP developer. 

Could we ever get them to be perfect?  Probably not.  But they would be good at 
the start and then very good and there would like to be many people providing 
PRs to fix new edge cases for the dialects that get the most use.

> Building a tool that parses an arbitrary string to an AST for a spec that is 
> inconsistent, inaccessible, and not implemented correctly by anyone is a 
> fool's errand, and that's just the first part of it.  

That is not what I am proposing.  I am proposing to build a tool that knows how 
to parse based on a simplified grammar and then sanitize based on those rules.

There really are only a few contexts in a SQL query to be concerned about.  
Keywords, field names, table name, table aliases, and typed values (plus maybe 
a few other things I missed?) The grammars themselves can evolve independently 
as userland implementations of interfaces. This is not rocket science.

> That's not even getting into designing an API for people to modify it,

No API per se, an interface.  (Yes that is also an API, but not one that you 
call; instead it calls you.)

> or questions of performance,

Parsing a SQL query should not take nearly as long as executing the query 
itself, especially if the SqlObjectModel were written in C.

> or compiling the AST back into a DB-specific string

Sounds like you are envisioning something I am not.

> AND then doing parameter binding which varies from one database to another.

Again, that would be handled by the dialect class.

> You're talking about reimplementing major portions of MySQL, PostgreSQL, 
> Oracle, etc. themselves in PHP, all at the same time.  Well, good luck, 
> you're going to need it.

You seem to be focused on that fact you think this would be hard instead of if 
it would be a viable solution if implemented?

> Personally I've long since concluded that database portability is no longer 
> an achievable or even desirable feature.  SQL is just too fragmented a 
> language, leaving you with a least common denominator that is grossly 
> under-whelming for modern needs.  If you want more than SQL92, it's not 
> really viable anymore.

I agree with you.  That is why portability is not relevant to what I am 
proposing.

Maybe this will help. There are a billion XML schemas, but DomDocument and its 
related classes can process them all. A SqlObjectModel would be similar; it 
would know how to process text queries where the dialect interface implementors 
would be the equivalent of the XML schema.  

It is not a perfect analogy, but it is close enough.

-Mike

P.S. I put this at the end, because it is important.  There is no reason we 
would have to implement all syntax for a given SQL dialect.  I'll bet 5% of any 
SQL dialect is used 95% of the time.  Implement that 5% and then tell 
developers who need the other 95% of edge cases — like DML —that they have to 
sanitize those queries manually.  

And that ~5% would provide injection protection for that ~95% of most common 
use-cases.  Why let perfect be the enemy of the good?


Reply via email to