[Hsqldb-developers] On The Subject Of Prepared And Callable Statements:

Campbell Boucher-Burnet Sat, 06 Apr 2002 04:24:35 -0800

On The Subject Of Prepared And Callable Statements:

Over the last few weeks, there has been occasion to discuss implementing server-side statement cursors, not only to allow updatable and efficient (engine side) scollable result sets, but also to possibly increase performance for oft used SQL statements, via pre-compiled (engine-side) prepared statements.

So I decided, once again, to do some quick experiments to prove one way or another, or at least to sheld some light onto, whether preparing and then calling prepared statements could be any faster than simply parsing and executing statements directly. As has been my general experience with commercial databases, this does not seem likely, or at least the gains may end up being very slight, given the extra work and the additional bulk that will be required of the code base. The problem seems to be that the break-even point, in the sense that the % overhead is too high for short running statmements, and the % savings is too low for longer running statements. That is, the sweet spot--the area where the savings of bypassing the parse of the statement to be ececuted outweigh the overhead of the stored procedure call--is very narrow.

As indicated by my ealier simple and naive experiments w.r.t. pervasively caching compiled statements against their SQL text, thus bypassing the parse phase completely upon detecting a cache hit:

Performace Gains via Caching Parsed Statments:

http://sourceforge.net/forum/forum.php?thread_id=633995&forum_id=73673

execution can be sped up considerably, ranging from 5-6x improvement for close to 100% cache hits on short running statements such as "select * from table where pk_column = literal_value", to 2-3x faster for slightly longer running statements such as "select a.*, b.* from table_1 as a, table_2 as b where a.column_i = literal_value and b.column_j = a.column_j" However, preparing and then calling prepared statements involves a different and inescapable mechanism:

First, a CALL must be made to prepare the statement, saving it on the server under a handle (say CALL prepare(sql) returns int), then another CALL must be made to execute the prepared statement (say CALL execute(int) returns Result). As such, the call to execute the prepared statement must itself be parsed, so the advantage of bypassing the parser completely--as with pervasive statement parsing--is not reaslized. Plus, the overhead for parsing and executing a CALL statement itself is typically high, both because expression evaluation must be performed to set up the paramters to the CALL procedure and to determine the Java method to invoke, and because performing a CALL uses Java reflection to match the procedure name with a java.lang.reflect.Method, repackage all paramters using Object wrappers, and then perform a Method.invoke, rather than directly executing a Java method in bytecode.

The bottom line is that the break-even point between escaping the overhead of reparsing a direct statement many times and the overhead of parsing a call to execute a prepared statement using reflective vs. direct method invocation may simply be too high. Indeed, the results below seem to indicate that just when the overhead of using a CALL to execute a prepared statement is compensated by avoiding the the complexity of reparsing the statement it executes, the actually execution (vs. parse) time of the statement starts to become the dominatating factor in the overall performance, thereby nullifying any net benefit.

The Experiment:

In org.hsqldb.Library, I added the following…

static Hashtable hPreparedStatements = new Hashtable();

static int prepareCount = 0;

public static int prepare(Connection conn, String statement)

throws SQLException {

jdbcConnection connection = (jdbcConnection) conn;

Database database = connection.dDatabase;

Session session = connection.cSession;

Tokenizer c = new Tokenizer(statement);

Parser p = new Parser(database, c, session);

String sToken = c.getString();

if (sToken.length() == 0) {

Trace.error(Trace.UNEXPECTED_END_OF_COMMAND);

}

if (!"SELECT".equals(sToken)) {

Trace.error(Trace.FUNCTION_NOT_SUPPORTED,"prepare only works for Select statements");

}

Select select = p.parseSelect();

prepareCount++;

hPreparedStatements.put( new Integer(prepareCount), select);

return prepareCount;

}

public static Object execute(Connection conn, int i)

throws SQLException {

Select select = (Select) hPreparedStatements.get( new Integer(i) );

if (select == null) {

Trace.error(Trace.FUNCTION_NOT_SUPPORTED,"no prepared statement for " + i);

}

return select.getResult(0);

}

Please note that jdbcConnection had to be altered to make its dDatabase and cSession (was cChannel until RC4) package private instead of private. In a In production quality implementation, obviously some otherapproach should be taken.

In Parser, the follwing modification was made:

Result processCall() throws SQLException {

Expression e = parseExpression();

e.resolve(null);

int type = e.getDataType();

Object o = e.getValue();

// changed to allow Library methods to return Objects that are instances of Result, as would be required to implement prepared statements inside the engine

if (o instanceof Result) {

return (Result)o;

}

Result r = new Result(1);

r.sTable[0] = "";

r.colType[0] = type;

r.sLabel[0] = "";

r.sName[0] = "";

Object row[] = new Object[1];

row[0] = o;

r.add(row);

return r;

}

After compiling, I started up DatabaseManager and ran the following script:

-->>>TEST<<<-- ;

--#1000;

CREATE TABLE Test(

Id INTEGER PRIMARY KEY,

FirstName VARCHAR(20),

Name VARCHAR(50),

ZIP INTEGER) ;

CREATE TABLE Test2(

Id INTEGER PRIMARY KEY,

FirstName VARCHAR(20),

Name VARCHAR(50),

ZIP INTEGER) ;

INSERT INTO Test

VALUES(#,'Julia','Peterson-Clancy',#r#) ;

INSERT INTO Test2

VALUES(#,'Julia','Peterson-Clancy',#r#) ;

which fills the "zip" column in both tables with random values

After that, I ran the following script to generate a bunch of prepared statements:

-->>>TEST<<<-- ;

--#1000;

call "org.hsqldb.Library.prepare"('select * from test a, test2 b where a.zip = #r# and b.zip = a.zip')

Then I ran these scripts several times to compare performance:

-->>>TEST<<<-- ;

--#1000;

select * from test a, test2 b where a.zip = #r# and b.zip = a.zip

Results:

9454 ms : select * from test a, test2 b where a.zip = #r# and b.zip = a.zip

8963 ms : select * from test a, test2 b where a.zip = #r# and b.zip = a.zip

8602 ms : select * from test a, test2 b where a.zip = #r# and b.zip = a.zip

-->>>TEST<<<-- ;

--#1000;

call "org.hsqldb.Library.execute"(#+1)

Results:

9754 ms : call "org.hsqldb.Library.execute"(#+1)

9413 ms : call "org.hsqldb.Library.execute"(#+1)

9424 ms : call "org.hsqldb.Library.execute"(#+1)

9364 ms : call "org.hsqldb.Library.execute"(#+1)

9333 ms : call "org.hsqldb.Library.execute"(#+1)

As you can see, the overhead currently associated with parsing and executing a "CALL" statement ruins any benefit obtained by accessing a prepared (precompiled) select statement, even a moderately complex one.

The results for calling a very simple prepared statement, such as:

SELECT * FROM TEST WHERE ID = #

were even more dismal, with the total for 1000 iterations coming in at ~ 1100 ms, vs. 420 - 480 ms for simply executing the select statements directly.

Some of this can be attributed to the fact that we do not cache java.lang.reflect.Methods when resolving CALL statements to reflective method invocations, and some of it can be attributed, I suppose, to the fact that reflective invocation is supposedly slower than direct invocation (although there is significant evidence t oshow that it is really not very much slower in the general scheme of things). But I think much can also be attributed to the fact that each CALL statement must itself wend its way through Parser and Expression, before finally getting it’s precompiled select statement from the cache and executing it. This is very much unlike pervasive statement caching, where the entire statement (possibly consisting of many actual atomic SQL operations, suhc as insert, update, delete, select..) is used to do a hash lookup for a precompiled (possibly collection of) statement(s) before *any* parse is attempted, with a cahe hit resulting in the direct execution of the the precompiled form, completely bypassing any parse at all.

So once again, the conclusion I keep coming to (a conlusion stated on Road Map in a post of mine showing experimentally that ~ 60% of statement processing time for the standard test script is spent in the parsing phase, with ~20% in actual update, and another ~20% spent in logging) that the single best and most direct way to improve the over-all performance of the hsql database engine for short-running statements is to really focus mostly on speeding up and making the parser more efficient, if possible.

Of course, for longer running statements, we will still need to introduce optimization of query execution and possibly multithreaded execution for better throughput under a mix of long and short running transactions...but that is a ways off and requires a complete rearchitecting, whereas there are many things we can do right now to improve parse performance.

All comments and suggestions welcome.

Cheers,

Campbell

[Hsqldb-developers] On The Subject Of Prepared And Callable Statements:

Reply via email to