Re: [DB-SIG] Improved support for prepared SQL statements

Michael Bayer Thu, 18 Dec 2014 14:20:32 -0800

> On Dec 18, 2014, at 3:57 PM, Michael Bayer <mike...@zzzcomputing.com> wrote:
> 
>> 
>> On Dec 18, 2014, at 3:39 PM, M.-A. Lemburg <m...@egenix.com> wrote:
>> 
>> 
>>> That would make the entire feature a non-starter for me.    SQLAlchemy 
>>> doesn’t hold cursors open beyond a single statement.    My users would very 
>>> much want a prepared-statement-per-transaction object.
>> 
>> Perhaps you ought to reconsider this approach. Creating and closing
>> cursors all the time does involve somewhat of an overhead.
> 
> I will attempt to try this, though I am anticipating that DBAPIs are going to 
> be problematic with this approach.    One concrete example is the case where 
> on psycopg2, we offer the option to use a “named” cursor, which on psycopg2 
> has the effect of maintaining the state of this cursor on the server side. 
> However psycopg2 throws an error if such a cursor is used for anything other 
> than a SELECT statement. So right there, we need more than one cursor based 
> on the contents of the SQL.   This is kind of a very specific situation 
> though, I’ll see if the approach in general produces issues.


So I tried this, and pleasantly, there’s not *too* much side effect, meaning a 
quick test against a few databases didn’t lead to many issues.   Where there 
were issues are in the tests relating to connection invalidation within a 2pc 
context; I didn’t dig in to what the issues are but its possible that the MySQL 
and psycopg2 DBAPIs have some more quirks with cursors when 2pc is used (or my 
changes just were missing some edge cases).

However, if I were to change this for real, it means that small bump in 
stability now gets sent out to everyone, working on databases I don’t even have 
regular access to such as sybase and DB2, and whatever quirks of reusing 
cursors might exist that I’ve not been able to test; many years of effort and 
user-feedback has gone into getting our Connection class to be stable and 
predictable in an extremely wide variety of situations (where we’re talking 
here about failure modes: disconnects, deadlocks, timeouts, intercepting these 
conditions perfectly and getting the system back into a stable state as 
efficiently and crash-free as possible), and here we’re presented with the 
potential of overhead from opening and closing many cursors, rather than 
keeping one around for…I would presume the transaction scope.   

This is exactly what I was getting at in my other email.  We are considering a 
significant change in a key area of stability in the name of “reducing 
overhead”, so is it really worth it?    For the drivers that the vast majority 
of my users care about at least, the effect would appear to be negligible, 
hitting barely a 1% difference with the pure Python drivers that have much 
bigger performance problems just by being in pure Python:

psycopg2 single cursor: 6.159881 (10000 executions)
psycopg2 multi cursor: 6.173749 (10000 executions)

pg8000 single cursor: 28.213494 (1000 executions)
pg8000 multi cursor: 28.620359 (1000 executions)

mysqldb single cursor (10000 executions): 11.702930
mysqldb multi cursor (10000 executions): 11.809935

mysql connector single cursor (1000 executions): 25.707400
mysql connector multi cursor (1000 executions): 26.096313

I also had the idea that maybe the above cases don’t show much because these 
drivers aren’t using pure “server side” cursors in the first place; I know in 
ODBC, we have more of a formal “cursor” construct in the protocol and that is 
probably what you’re referring to (though I googled that just now, and per 
http://msdn.microsoft.com/en-us/library/ms130794.aspx it says explicit cursors 
are rarely used and ODBC automatically opens a cursor for individual result 
sets).   So I decided to try psycopg2 with a “named” cursor, and got this:

        psycopg2.ProgrammingError: can't call .execute() on named cursors more 
than once

Wow!  So that’s definitely that :).

If my DB2 users upgrade to SQLAlchemy 1.0 and start experiencing less stable 
behavior with connections, they’d be fairly upset if I told them it was in the 
name of a 0.3% overhead improvement.    Kind of like in the case of prepared 
statements, if the server-side cursors can truly be safely recycled, this is 
highly dependent on the database and the DBAPI methodology in use, and again 
the DBAPI could offer this as a configurable feature not exposed on the outside 
(the way ODBC can offer transparent connection pooling, if you will).




import random
import timeit
import psycopg2
import pg8000
import MySQLdb
from mysql import connector as myconnpy


def setup(conn):
    cursor = conn.cursor()
    cursor.execute("drop table if exists data")
    cursor.execute(
        "create table data (id integer primary key, data VARCHAR(100))")
    cursor.executemany(
        "insert into data (id, data) values (%s, %s)",
        [
            (counter + 1, "some value: %d" % random.randint(0, 10000),)
            for counter in xrange(1000)
        ]
    )
    cursor.close()


def run_test_multi_cursor(conn):
    cursor = conn.cursor()
    cursor.execute("select * from data")
    cursor.fetchall()
    cursor.close()


def run_test_single_cursor(conn, cursor):
    cursor.execute("select * from data")
    cursor.fetchall()


def do_time(dbapi, single_cursor, style, number):
    global conn

    if style == 'postgresql':
        conn = dbapi.connect(
            user='scott', password='tiger',
            database='test', host='localhost')
    elif style == 'mysql':
        conn = dbapi.connect(
            user='scott', passwd='tiger',
            db='test', host='localhost')

    if single_cursor:
        global cursor
        cursor = conn.cursor()
        time = timeit.timeit(
            "run_test_single_cursor(conn, cursor)",
            "from __main__ import run_test_single_cursor, setup, "
            "conn, cursor; setup(conn)",
            number=number
        )
        cursor.close()
    else:
        time = timeit.timeit(
            "run_test_multi_cursor(conn)",
            "from __main__ import run_test_multi_cursor, "
            "setup, conn; setup(conn)",
            number=number
        )

    conn.close()
    return time

psycopg2_cursor_time = do_time(psycopg2, True, 'postgresql', 10000)
psycopg2_non_cursor_time = do_time(psycopg2, False, 'postgresql', 10000)
pg8000_cursor_time = do_time(pg8000, True, 'postgresql', 1000)
pg8000_non_cursor_time = do_time(pg8000, False, 'postgresql', 1000)
mysqldb_cursor_time = do_time(MySQLdb, True, 'mysql', 10000)
mysqldb_non_cursor_time = do_time(MySQLdb, False, 'mysql', 10000)
mysqlconn_cursor_time = do_time(myconnpy, True, 'mysql', 1000)
mysqlconn_non_cursor_time = do_time(myconnpy, False, 'mysql', 1000)


print("psycopg2 single cursor: %f (10000 executions)" % psycopg2_cursor_time)
print("psycopg2 multi cursor: %f (10000 executions)" % psycopg2_non_cursor_time)
print("pg8000 single cursor: %f (1000 executions)" % pg8000_cursor_time)
print("pg8000 multi cursor: %f (1000 executions)" % pg8000_non_cursor_time)
print("mysqldb single cursor (10000 executions): %f" % mysqldb_cursor_time)
print("mysqldb multi cursor (10000 executions): %f" % mysqldb_non_cursor_time)
print("mysql connector single cursor (1000 executions): %f" % 
mysqlconn_cursor_time)
print("mysql connector multi cursor (1000 executions): %f" % 
mysqlconn_non_cursor_time)



> 
>> Note that the use of cached prepared cursors for performance
>> reasons is only one use of having the prepare step available
>> on cursors. In practice, the more important one is to be able
>> check SQL statements for errors without executing them and
>> possibly causing a rollback on the transaction.
> 
> Which kinds of errors are you referring to, if the statement has not been 
> invoked, I would imagine this refers only to syntactical errors?  What kind 
> of application contains SQL that may have syntactical errors that only become 
> apparent at runtime and can’t be eliminated during testing?   
> 
> 
> 
> 
>> 
>> -- 
>> Marc-Andre Lemburg
>> eGenix.com
>> 
>> Professional Python Services directly from the Source  (#1, Dec 18 2014)
>>>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>>>> mxODBC Plone/Zope Database Adapter ...       http://zope.egenix.com/
>>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
>> ________________________________________________________________________
>> 2014-12-11: Released mxODBC Plone/Zope DA 2.2.0   http://egenix.com/go67
>> 
>> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
>> 
>>  eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>>   D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>>          Registered at Amtsgericht Duesseldorf: HRB 46611
>>              http://www.egenix.com/company/contact/
> 
> _______________________________________________
> DB-SIG maillist  -  DB-SIG@python.org
> https://mail.python.org/mailman/listinfo/db-sig

_______________________________________________
DB-SIG maillist  -  DB-SIG@python.org
https://mail.python.org/mailman/listinfo/db-sig

Re: [DB-SIG] Improved support for prepared SQL statements

Reply via email to