On Sun, 12 Aug 2007 02:30:45 +0200 Paul Boddie <[EMAIL PROTECTED]> wrote:
> On Sunday 12 August 2007 00:33, Mike Meyer wrote: > > On Sat, 11 Aug 2007 17:10:34 -0400 Carsten Haese <[EMAIL PROTECTED]> > wrote: > > > > > > The iron-clad, all-encompassing, golden rule is this: If something looks > > > like a parameter marker and occurs in a place where a parameter marker > > > is syntactically allowed, it's a parameter marker. Anything else isn't. > > > > Ok, I'll accept that that's iron-clad and all-encompassing. It's also > > circular. I suspect it also changes depending on the underlying > > database. > How is the above circular? If we have something, for example "?", which looks > like a parameter marker (because we've defined "?" to be a parameter marker), > and it appears in a place where such parameter markers are syntactically > allowed/possible/recognised (see the SQL specifications available on the > Internet), then our software notes that "?" in that place is a parameter > marker. Otherwise, the occurrence of "?" just forms part of some other > sequence of characters and is not considered to be such a parameter marker by > our software. How is "We only recognize parameter markers where we recognize parameter markers" *not* circular? > > > To execute a query where table names and column names are variable, you > > > should use string formatting yourself to build the structure of the > > > query, placing parameter markers into the structure where necessary, and > > > then using parameter binding to supply the actual values. > > > > But isn't one of the arguments against pyformat/format is that they > > lead to people doing string formatting to build the query, which is a > > bad thing because, unless done very carefully, they leave you exposed > > to all kinds of data injection attacks? If the other styles wind up > > *requiring* you to build the query with string formatting, how can > > they possibly be considered superior? > > Because for a lot of applications, people don't need to build up queries from > parts, or if they do, they can do it without mixing in user-supplied values. Sorry, not buying that one. "Most people don't need foo" isn't an argument as to why something that doesn't do foo might be consider superior to something that does. > Distinguishing parameter usage from string substitution reduces confusion for > beginners who don't tend to do such advanced stuff, and it makes them more > aware of the issues when they finally start doing so. I think you just pinpointed the problem: parameter substitution in dbapi is being advertised as the solution to a problem it's not really adequate to solve. IIUC, it's restricted by the underlying SQL implementation (and inherits portability problems from there as well). That some underlying SQL implementations may not support it at all further complicates things. So, in the spirit of having the module authors instead of the users do the work, how about taking this burden away from execute/executemany, and providing a tool that is adequate to the job. Here's a clean slate design proposal: execute's parameters are for access to the underlying SQL engines parameter mechanism. It takes a string and a dict or list (just as it does now). The string is passed to the database unchanged. executemany is related to execute the same way it is now. paramstyles is a list of strings indicating what parameters styles the underlying database supports. The currently recognizes styles are 'qmark', 'named' and 'numeric'. An empty list means the database doesn't support parameter substitution, and is a perfectly valid value. The module author doesn't have to provide a mechanism for doing this so people can build statements from untrusted data because we also provide: build_statement is a tool for safely creating SQL statements from untrusted data. The signature is build_statement(basestring, *args, **kwargs). Providing both *args and **kwargs is undefined. build_statement is a substitution mechanism, but not a simple string substitution. Instead, the substitution markers indicate where we substitute SQl tokens in the statement. Since tokens have types that can't be determined from the type of the value - in particular, a delimited identifier for a column name vs. a string literal in an expression - we have to have type information for the marker. So we're going to use the familiar %-notation to provide it. The possible type indicaters are: %s - produce a string literal. Values will be coerced to strings. %d,f,g - the usual numeric substitutions. %i - produce an identifier. Values must be strings. %t - produce a time literal from a datetime.datetime or None. None means to have the database substitute the current time when the statement is executed. <mike -- Mike Meyer <[EMAIL PROTECTED]> http://www.mired.org/consulting.html Independent Network/Unix/Perforce consultant, email for more information. _______________________________________________ DB-SIG maillist - DB-SIG@python.org http://mail.python.org/mailman/listinfo/db-sig