On Tue, May 21, 2013 at 7:30 PM, M.-A. Lemburg <m...@egenix.com> wrote:
> However, escaping doesn't strike me as one of the important > details, since I have yet to find an example where you'd > actually need escaping :-) Uhm, I thought I'd shown a few of them. But they may have been scattered around in the long thread and spinoffs. So, my spanners in the machine are: > First of all, SQL string literals should not be subject to > marker parsing. They must be skipped and care must be taken > to make sure that the '-quoting is being respected ('' -> '). I've shown in a separate thread that postgres defines other string literals than the 'quoted' ones. For instance the multiline ones, using $$these$$ as delimiters, that may also be named and nested. There are of course comments too to take care of. And "quoted" identifiers. And on and on. Even the layman knowledge that quotes are escaped in sql by doubling them is shaken by the postgres parser: the string "a'b" can be expressed as E'a\'b'. In short, parsing postgres syntax requires a well built tokenizer on the client, an excessive requirement for a generic client-side library (especially one that is not supposed to "understand" the query being sent but whose only role is to pass it to the server). > * qmark case: question marks don't appear outside SQL string > literals, so you don't need to escape them - they will > always refer to binding markers In postgres ? is a valid operator, or part of valid operators. The following query reports there are 30 such operators predefined in postgres 9.1: =# select format('%s %s %s', oprleft::regtype, oprname, oprright::regtype) from pg_operator where oprname ~ E'\\?'; abstime <?> tinterval path ?# path box ?# box ... There are hstore operators <http://www.postgresql.org/docs/9.1/static/hstore.html#AEN135244>, geometric operators <http://www.postgresql.org/docs/9.1/static/functions-geometry.html>, and extension modules can -and do- add others <http://pgmp.projects.pgfoundry.org/mpz.html#division-operators-and-functions>. > * named case: it should be enough to make sure that the colon > of a named marker is not preceded by another colon > (negative look-behind). If it is, the potential substring > is not a named marker, e.g. "abc::integer < 3" is not a named > marker, "abc < :integer" is a named marker If this works, I can offer a life jacket proposing to always use the : as escape symbol, escaping ? as :? Don't know yet if it's valid: I'm not practical with negative look-behind parsing. Definitely doesn't look a familiar way to escape stuff. > However, I think we can leave those deliberately underspecified > in DB-API 3.0 to both encourage use of one of the mandatory > paramstyles, and to allow database module authors to continue > exposing natively used paramstyles. > > Something we do need to address in a paramstyle spec for > qmark and named is SQL comments. Perhaps easiest would be > to disallow them in SQL statements passed to .execute*(). It is definitely an "easy" route, but it leaves us with drivers that by spec may forbid the use of certain database features: I feel it an excessive price to pay in the name of consistency. Again, sorry if I'm repetitive, but note that many of the problems stem by a sort of implied feature request: that a driver can support both qmark and named placeholders. I see how a driver supporting both sequences and maps as parameters is handier than one only offering a single one, but I think that precisely the qmark/named pair is a dreadful combination to force into cohabitation, and this is the main force leading to unnatural requirements such as the impossibility to express some of the valid characters in the query. Even taking the hard case of postgres: if qmark was the only chosen style, single qmarks would become placeholders and double qmarks would become single (as operators, in literals, in comments, everywhere, without the need of a lexer). If instead only named was the chosen one people should write casts as a::::int, but the system would be provable correct. It's when both are allowed that escaping becomes an hairy thing. If the dbapi3 requirement becomes to mandate support for a single paramstyle things become manageable again. -- Daniele _______________________________________________ DB-SIG maillist - DB-SIG@python.org http://mail.python.org/mailman/listinfo/db-sig