[racket-users] Re: prepared queries

2017-03-17 Thread Ryan Culpepper

On 3/15/17 12:41 AM, George Neuner wrote:

Hi Ryan,

Hope you enjoyed the snow day.  Lost power here for a while, but
fortunately no damage.


I did :) My neighborhood didn't get much snow by volume, but what it did 
get was then rained/sleeted nearly into ice.



On 3/13/2017 11:09 PM, Ryan Culpepper wrote:

On 03/13/2017 06:30 PM, George Neuner wrote:


Then there is the issue that deliberately prepared queries are not
optimized as heavily as normal queries - which is the case in pretty
much every DBMS that exists.  Heavyweight queries that require maximum
performance are harmed by preparation.


In general, performing a query consists of three steps: Prepare, Bind,
and Execute. When there's a "one-step" approach, it's just a
convenience wrapper around those three steps. So I'd be very surprised
if Prepare intrinsically slowed down a query.

Maybe you mean that *parameterization* inhibits the query planner,
because query planning doesn't take the values of query parameters
into account. That seems to depend on the backend. PostgreSQL did
planning at Prepare time before version 9.2 and does it at Bind time
since 9.2 ("typically", the docs say). I don't know about other
systems. But in any case, parameterization is under the control of the
user.

If that's not what you meant, can you elaborate?


When I read this, it occurred to me that you and I have been talking at
cross purposes because there are 2 distinct notions of "prepare":
client side /prepare /and server side /PREPARE/ are unrelated and
(mostly) not even associated.


I disagree with parts of this. I would label the two notions as 
"client-originating prepare" and "SQL-originating PREPARE". But both of 
them involve allocating a resource on the server, and both allocate 
nearly the same kind of resource.


For PostgreSQL and MySQL, "client prepare" involves sending a message 
(Parse and COM_STMT_PREPARE, respectively) to the server containing the 
query string. The server parses, checks, and maybe plans the query and 
stores it on the server under some identifier (in PostgreSQL, the client 
chooses; in MySQL, the server chooses). On the client side, a prepared 
statement object just wraps that identifier (along with parameter and 
result metadata obtained from the server). On the server side, the 
prepared statements show up in the relevant reflection tables 
(pg_prepared_statements [1] and prepared_statement_instances [2], 
respectively), just like those created with "SQL PREPARE", but there are 
fields that distinguish the two kinds.


For sqlite3, as you said, the client/server distinction is irrelevant. 
For ODBC, the behavior is driver-dependent. A driver could possibly 
simulate prepared statements using only client-side resources, but it 
would be a lot of work to support the inspection features like 
SQLDescribeCol, etc.


[1] 
https://www.postgresql.org/docs/current/static/view-pg-prepared-statements.html
[2] 
https://dev.mysql.com/doc/refman/5.7/en/prepared-statements-instances-table.html



So to put us both on the same page  [and educate anyone interested] ...
[... snipped pragmatics of PREPARE ...]

On the client side, the /prepare/  protocol step creates local data
structures for associating arguments with query parameters - structures
which can be kept and reused (perhaps binding different arguments).
Client /prepare/d queries persist on the client until dropped, but this
is separate from - and has no effect on - server side query caching^[4]
.  And except in the case of a SQL EXECUTE command, it has nothing to do
with server /PREPARE/d queries.


I would say the two notions of prepared statement have different APIs 
and possibly different pragmatics (eg planning and caching), but they 
are both references to resources on the database back end.



  ---

So when you [Ryan] say:


If you are using `prepare` just for speed, it might help to know that
most base connections have an internal statement cache that maps SQL
strings to prepared statement objects. The cache is only used inside
of transactions, though, to avoid issues with concurrent schema changes.

and

Racket's db library always prepares a statement before executing it,
even if there are no query parameters. When allowed, instead of
closing the prepared statement immediately after executing it, the
connection stores it in a (Racket-side, per-connection) cache. The
cache key is just the SQL string. If you use the same query string
with different parameters, you'll still get the cached prepared
statement.


I now am assuming that you are talking about client side protocol data
structures and NOT about any use of SQL PREPARE / EXECUTE commands.

The reason I want to be clear about it is because /PREPARE/d queries
persist and consume server resources until the session ends.  Long
running applications that create but never drop them can run afoul of
server limits and suffer anything from cascading errors, to hangups or
hung sessions, or even server crashes.

Although the same 

[racket-users] Re: prepared queries

2017-03-14 Thread George Neuner

Hi Ryan,

Hope you enjoyed the snow day.  Lost power here for a while, but 
fortunately no damage.



On 3/13/2017 11:09 PM, Ryan Culpepper wrote:

On 03/13/2017 06:30 PM, George Neuner wrote:


Then there is the issue that deliberately prepared queries are not
optimized as heavily as normal queries - which is the case in pretty
much every DBMS that exists.  Heavyweight queries that require maximum
performance are harmed by preparation.


In general, performing a query consists of three steps: Prepare, Bind, 
and Execute. When there's a "one-step" approach, it's just a 
convenience wrapper around those three steps. So I'd be very surprised 
if Prepare intrinsically slowed down a query.


Maybe you mean that *parameterization* inhibits the query planner, 
because query planning doesn't take the values of query parameters 
into account. That seems to depend on the backend. PostgreSQL did 
planning at Prepare time before version 9.2 and does it at Bind time 
since 9.2 ("typically", the docs say). I don't know about other 
systems. But in any case, parameterization is under the control of the 
user.


If that's not what you meant, can you elaborate?


When I read this, it occurred to me that you and I have been talking at 
cross purposes because there are 2 distinct notions of "prepare":  
client side /prepare /and server side /PREPARE/ are unrelated and 
(mostly) not even associated.


So to put us both on the same page  [and educate anyone interested] ...

---
Unless a query is submitted using a SQL PREPARE command, then most DBMS 
will compile and plan AS IF the query were static text with inline 
arguments.^[1][2]   The server typically then will cache the code and 
plan for some time in case an identical query (both text and arguments) 
is submitted later.


The SQL PREPARE command - which includes query text - compiles the query 
using indirection hooks for any parameters, in lieu of having the actual 
arguments available.  [The compiled query is executed using the SQL 
EXECUTE command which supplies any needed arguments].   In most cases, a 
generic execution plan is generated based only on the code because 
specific argument values are unknown.


Some DBMS will simply continue to use a generic execution plan, others 
replan each time based on the specific arguments, and some change 
behavior depending on how often different arguments are supplied.  The 
savings (if any) come mainly from avoiding recompilation of the query 
for each different set of arguments.


/PREPARE/d queries are not cached in the same way as are static queries. 
/PREPARE/d queries are never dropped automatically - they persist until 
explicitly dropped by the user/application, or until their owning 
session^[3] is terminated.



On the client side, the /prepare/  protocol step creates local data 
structures for associating arguments with query parameters - structures 
which can be kept and reused (perhaps binding different arguments).  
Client /prepare/d queries persist on the client until dropped, but this 
is separate from - and has no effect on - server side query caching^[4] 
.  And except in the case of a SQL EXECUTE command, it has nothing to do 
with server /PREPARE/d queries.



[1] The compiler knows that arguments to be substituted for query 
parameters are not to be considered SQL code, and so the "expanded" 
query is guaranteed to be safe (for some definition) from SQL injection 
faults.


[2] The compiler does _not_ rewrite the query text ... "substitution" of 
arguments is in the compiled code.


[3] "Session" is not necessarily synonymous with "connection".  In some 
DBMS, a "session" can consist of multiple simultaneous connections which 
can share any temporary objects - temp tables, prepared queries, etc. - 
that have been created.


[4] Exceptions are local "library" DBMS such as SQLite, in which the 
"client" and "server" are together in the same process and query data 
structures are (or can be) shared.

  ---


So when you [Ryan] say:

If you are using `prepare` just for speed, it might help to know that 
most base connections have an internal statement cache that maps SQL 
strings to prepared statement objects. The cache is only used inside 
of transactions, though, to avoid issues with concurrent schema changes. 

and

Racket's db library always prepares a statement before executing it,
even if there are no query parameters. When allowed, instead of
closing the prepared statement immediately after executing it, the
connection stores it in a (Racket-side, per-connection) cache. The
cache key is just the SQL string. If you use the same query string
with different parameters, you'll still get the cached prepared
statement. 


I now am assuming that you are talking about client side protocol data 
structures and NOT about any use of SQL PREPARE / EXECUTE commands.


The reason I want to be clear about it is because /PREPARE/d queries 
persist and consume server resources until the session ends.  Long 

[racket-users] Re: prepared queries

2017-03-13 Thread Ryan Culpepper

On 03/13/2017 06:30 PM, George Neuner wrote:

Hi Ryan,

On 3/13/2017 5:43 PM, Ryan Culpepper wrote:


Racket's db library always prepares a statement before executing it,
even if there are no query parameters. When allowed, instead of
closing the prepared statement immediately after executing it, the
connection stores it in a (Racket-side, per-connection) cache. The
cache key is just the SQL string. If you use the same query string
with different parameters, you'll still get the cached prepared
statement.

To extend my previous example:

  > (query c "select $1::integer" 1)
  
  ** caching statement
  
  > (query c "select $1::integer" 2)
  ** using cached statement
  

On the other hand, if the query string is not identical, it misses the
cache.


Hmm.   Then what is the purpose of having separate (prepare ...)?


Explicit control, history (the statement cache is much newer), and 
reflection (eg prepared-statement-result-types).



And what happens if the unknowing user issues a PREPARE statement for the
query directly to the DBMS?


You mean like PostgreSQL's PREPARE/EXECUTE commands? Those should work 
fine too, as long as the statements are given names distinct from the 
obscure names chosen by the db library.



Then there is the issue that deliberately prepared queries are not
optimized as heavily as normal queries - which is the case in pretty
much every DBMS that exists.  Heavyweight queries that require maximum
performance are harmed by preparation.


In general, performing a query consists of three steps: Prepare, Bind, 
and Execute. When there's a "one-step" approach, it's just a convenience 
wrapper around those three steps. So I'd be very surprised if Prepare 
intrinsically slowed down a query.


Maybe you mean that *parameterization* inhibits the query planner, 
because query planning doesn't take the values of query parameters into 
account. That seems to depend on the backend. PostgreSQL did planning at 
Prepare time before version 9.2 and does it at Bind time since 9.2 
("typically", the docs say). I don't know about other systems. But in 
any case, parameterization is under the control of the user.


If that's not what you meant, can you elaborate?


Moreover, prepared queries are not cached in the normal sense ... they
persist until explicitly dropped, or until the connection is closed.
There are both per connection and global limits on the number of
prepared queries, and bad things can happen if these limits are
exceeded  [e.g., I've seen versions of Postgresql freeze and Sybase
crash in such circumstances].   A long running application using pool
connections could get into trouble.


The statement cache is limited to 20 statements by default, and the 
statement cache is only enabled within transactions and flushed on 
commit/rollback. Also, connections returned to a connection pool with an 
open transaction are disconnected rather than reused.


Ryan

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.