[racket-users] Re: prepared queries
On 3/15/17 12:41 AM, George Neuner wrote: Hi Ryan, Hope you enjoyed the snow day. Lost power here for a while, but fortunately no damage. I did :) My neighborhood didn't get much snow by volume, but what it did get was then rained/sleeted nearly into ice. On 3/13/2017 11:09 PM, Ryan Culpepper wrote: On 03/13/2017 06:30 PM, George Neuner wrote: Then there is the issue that deliberately prepared queries are not optimized as heavily as normal queries - which is the case in pretty much every DBMS that exists. Heavyweight queries that require maximum performance are harmed by preparation. In general, performing a query consists of three steps: Prepare, Bind, and Execute. When there's a "one-step" approach, it's just a convenience wrapper around those three steps. So I'd be very surprised if Prepare intrinsically slowed down a query. Maybe you mean that *parameterization* inhibits the query planner, because query planning doesn't take the values of query parameters into account. That seems to depend on the backend. PostgreSQL did planning at Prepare time before version 9.2 and does it at Bind time since 9.2 ("typically", the docs say). I don't know about other systems. But in any case, parameterization is under the control of the user. If that's not what you meant, can you elaborate? When I read this, it occurred to me that you and I have been talking at cross purposes because there are 2 distinct notions of "prepare": client side /prepare /and server side /PREPARE/ are unrelated and (mostly) not even associated. I disagree with parts of this. I would label the two notions as "client-originating prepare" and "SQL-originating PREPARE". But both of them involve allocating a resource on the server, and both allocate nearly the same kind of resource. For PostgreSQL and MySQL, "client prepare" involves sending a message (Parse and COM_STMT_PREPARE, respectively) to the server containing the query string. The server parses, checks, and maybe plans the query and stores it on the server under some identifier (in PostgreSQL, the client chooses; in MySQL, the server chooses). On the client side, a prepared statement object just wraps that identifier (along with parameter and result metadata obtained from the server). On the server side, the prepared statements show up in the relevant reflection tables (pg_prepared_statements [1] and prepared_statement_instances [2], respectively), just like those created with "SQL PREPARE", but there are fields that distinguish the two kinds. For sqlite3, as you said, the client/server distinction is irrelevant. For ODBC, the behavior is driver-dependent. A driver could possibly simulate prepared statements using only client-side resources, but it would be a lot of work to support the inspection features like SQLDescribeCol, etc. [1] https://www.postgresql.org/docs/current/static/view-pg-prepared-statements.html [2] https://dev.mysql.com/doc/refman/5.7/en/prepared-statements-instances-table.html So to put us both on the same page [and educate anyone interested] ... [... snipped pragmatics of PREPARE ...] On the client side, the /prepare/ protocol step creates local data structures for associating arguments with query parameters - structures which can be kept and reused (perhaps binding different arguments). Client /prepare/d queries persist on the client until dropped, but this is separate from - and has no effect on - server side query caching^[4] . And except in the case of a SQL EXECUTE command, it has nothing to do with server /PREPARE/d queries. I would say the two notions of prepared statement have different APIs and possibly different pragmatics (eg planning and caching), but they are both references to resources on the database back end. --- So when you [Ryan] say: If you are using `prepare` just for speed, it might help to know that most base connections have an internal statement cache that maps SQL strings to prepared statement objects. The cache is only used inside of transactions, though, to avoid issues with concurrent schema changes. and Racket's db library always prepares a statement before executing it, even if there are no query parameters. When allowed, instead of closing the prepared statement immediately after executing it, the connection stores it in a (Racket-side, per-connection) cache. The cache key is just the SQL string. If you use the same query string with different parameters, you'll still get the cached prepared statement. I now am assuming that you are talking about client side protocol data structures and NOT about any use of SQL PREPARE / EXECUTE commands. The reason I want to be clear about it is because /PREPARE/d queries persist and consume server resources until the session ends. Long running applications that create but never drop them can run afoul of server limits and suffer anything from cascading errors, to hangups or hung sessions, or even server crashes. Although the same
[racket-users] Re: prepared queries
Hi Ryan, Hope you enjoyed the snow day. Lost power here for a while, but fortunately no damage. On 3/13/2017 11:09 PM, Ryan Culpepper wrote: On 03/13/2017 06:30 PM, George Neuner wrote: Then there is the issue that deliberately prepared queries are not optimized as heavily as normal queries - which is the case in pretty much every DBMS that exists. Heavyweight queries that require maximum performance are harmed by preparation. In general, performing a query consists of three steps: Prepare, Bind, and Execute. When there's a "one-step" approach, it's just a convenience wrapper around those three steps. So I'd be very surprised if Prepare intrinsically slowed down a query. Maybe you mean that *parameterization* inhibits the query planner, because query planning doesn't take the values of query parameters into account. That seems to depend on the backend. PostgreSQL did planning at Prepare time before version 9.2 and does it at Bind time since 9.2 ("typically", the docs say). I don't know about other systems. But in any case, parameterization is under the control of the user. If that's not what you meant, can you elaborate? When I read this, it occurred to me that you and I have been talking at cross purposes because there are 2 distinct notions of "prepare": client side /prepare /and server side /PREPARE/ are unrelated and (mostly) not even associated. So to put us both on the same page [and educate anyone interested] ... --- Unless a query is submitted using a SQL PREPARE command, then most DBMS will compile and plan AS IF the query were static text with inline arguments.^[1][2] The server typically then will cache the code and plan for some time in case an identical query (both text and arguments) is submitted later. The SQL PREPARE command - which includes query text - compiles the query using indirection hooks for any parameters, in lieu of having the actual arguments available. [The compiled query is executed using the SQL EXECUTE command which supplies any needed arguments]. In most cases, a generic execution plan is generated based only on the code because specific argument values are unknown. Some DBMS will simply continue to use a generic execution plan, others replan each time based on the specific arguments, and some change behavior depending on how often different arguments are supplied. The savings (if any) come mainly from avoiding recompilation of the query for each different set of arguments. /PREPARE/d queries are not cached in the same way as are static queries. /PREPARE/d queries are never dropped automatically - they persist until explicitly dropped by the user/application, or until their owning session^[3] is terminated. On the client side, the /prepare/ protocol step creates local data structures for associating arguments with query parameters - structures which can be kept and reused (perhaps binding different arguments). Client /prepare/d queries persist on the client until dropped, but this is separate from - and has no effect on - server side query caching^[4] . And except in the case of a SQL EXECUTE command, it has nothing to do with server /PREPARE/d queries. [1] The compiler knows that arguments to be substituted for query parameters are not to be considered SQL code, and so the "expanded" query is guaranteed to be safe (for some definition) from SQL injection faults. [2] The compiler does _not_ rewrite the query text ... "substitution" of arguments is in the compiled code. [3] "Session" is not necessarily synonymous with "connection". In some DBMS, a "session" can consist of multiple simultaneous connections which can share any temporary objects - temp tables, prepared queries, etc. - that have been created. [4] Exceptions are local "library" DBMS such as SQLite, in which the "client" and "server" are together in the same process and query data structures are (or can be) shared. --- So when you [Ryan] say: If you are using `prepare` just for speed, it might help to know that most base connections have an internal statement cache that maps SQL strings to prepared statement objects. The cache is only used inside of transactions, though, to avoid issues with concurrent schema changes. and Racket's db library always prepares a statement before executing it, even if there are no query parameters. When allowed, instead of closing the prepared statement immediately after executing it, the connection stores it in a (Racket-side, per-connection) cache. The cache key is just the SQL string. If you use the same query string with different parameters, you'll still get the cached prepared statement. I now am assuming that you are talking about client side protocol data structures and NOT about any use of SQL PREPARE / EXECUTE commands. The reason I want to be clear about it is because /PREPARE/d queries persist and consume server resources until the session ends. Long
[racket-users] Re: prepared queries
On 03/13/2017 06:30 PM, George Neuner wrote: Hi Ryan, On 3/13/2017 5:43 PM, Ryan Culpepper wrote: Racket's db library always prepares a statement before executing it, even if there are no query parameters. When allowed, instead of closing the prepared statement immediately after executing it, the connection stores it in a (Racket-side, per-connection) cache. The cache key is just the SQL string. If you use the same query string with different parameters, you'll still get the cached prepared statement. To extend my previous example: > (query c "select $1::integer" 1) ** caching statement > (query c "select $1::integer" 2) ** using cached statement On the other hand, if the query string is not identical, it misses the cache. Hmm. Then what is the purpose of having separate (prepare ...)? Explicit control, history (the statement cache is much newer), and reflection (eg prepared-statement-result-types). And what happens if the unknowing user issues a PREPARE statement for the query directly to the DBMS? You mean like PostgreSQL's PREPARE/EXECUTE commands? Those should work fine too, as long as the statements are given names distinct from the obscure names chosen by the db library. Then there is the issue that deliberately prepared queries are not optimized as heavily as normal queries - which is the case in pretty much every DBMS that exists. Heavyweight queries that require maximum performance are harmed by preparation. In general, performing a query consists of three steps: Prepare, Bind, and Execute. When there's a "one-step" approach, it's just a convenience wrapper around those three steps. So I'd be very surprised if Prepare intrinsically slowed down a query. Maybe you mean that *parameterization* inhibits the query planner, because query planning doesn't take the values of query parameters into account. That seems to depend on the backend. PostgreSQL did planning at Prepare time before version 9.2 and does it at Bind time since 9.2 ("typically", the docs say). I don't know about other systems. But in any case, parameterization is under the control of the user. If that's not what you meant, can you elaborate? Moreover, prepared queries are not cached in the normal sense ... they persist until explicitly dropped, or until the connection is closed. There are both per connection and global limits on the number of prepared queries, and bad things can happen if these limits are exceeded [e.g., I've seen versions of Postgresql freeze and Sybase crash in such circumstances]. A long running application using pool connections could get into trouble. The statement cache is limited to 20 statements by default, and the statement cache is only enabled within transactions and flushed on commit/rollback. Also, connections returned to a connection pool with an open transaction are disconnected rather than reused. Ryan -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.