On The Subject Of Prepared And Callable Statements: Over the last few weeks, there has been occasion to discuss implementing server-side statement cursors, not only to allow updatable and efficient (engine side) scollable result sets, but also to possibly increase performance for oft used SQL statements, via pre-compiled (engine-side) prepared statements. So I decided, once again, to do some quick experiments to prove one way or another, or at least to sheld some light onto, whether preparing and then calling prepared statements could be any faster than simply parsing and executing statements directly. As has been my general experience with commercial databases, this does not seem likely, or at least the gains may end up being very slight, given the extra work and the additional bulk that will be required of the code base. The problem seems to be that the break-even point, in the sense that the % overhead is too high for short running statmements, and the % savings is too low for longer running statements. That is, the sweet spot--the area where the savings of bypassing the parse of the statement to be ececuted outweigh the overhead of the stored procedure call--is very narrow. As indicated by my ealier simple and naive experiments w.r.t. pervasively caching compiled statements against their SQL text, thus bypassing the parse phase completely upon detecting a cache hit: Performace Gains via Caching Parsed Statments: http://sourceforge.net/forum/forum.php?thread_id=633995&forum_id=73673 execution can be sped up considerably, ranging from 5-6x improvement for close to 100% cache hits on short running statements such as "select * from table where pk_column = literal_value", to 2-3x faster for slightly longer running statements such as "select a.*, b.* from table_1 as a, table_2 as b where a.column_i = literal_value and b.column_j = a.column_j" However, preparing and then calling prepared statements involves a different and inescapable mechanism: First, a CALL must be made to prepare the statement, saving it on the server under a handle (say CALL prepare(sql) returns int), then another CALL must be made to execute the prepared statement (say CALL execute(int) returns Result). As such, the call to execute the prepared statement must itself be parsed, so the advantage of bypassing the parser completely--as with pervasive statement parsing--is not reaslized. Plus, the overhead for parsing and executing a CALL statement itself is typically high, both because expression evaluation must be performed to set up the paramters to the CALL procedure and to determine the Java method to invoke, and because performing a CALL uses Java reflection to match the procedure name with a java.lang.reflect.Method, repackage all paramters using Object wrappers, and then perform a Method.invoke, rather than directly executing a Java method in bytecode. The bottom line is that the break-even point between escaping the overhead of reparsing a direct statement many times and the overhead of parsing a call to execute a prepared statement using reflective vs. direct method invocation may simply be too high. Indeed, the results below seem to indicate that just when the overhead of using a CALL to execute a prepared statement is compensated by avoiding the the complexity of reparsing the statement it executes, the actually execution (vs. parse) time of the statement starts to become the dominatating factor in the overall performance, thereby nullifying any net benefit. The Experiment: In org.hsqldb.Library, I added the following… static Hashtable hPreparedStatements = new
Hashtable();
static int prepareCount = 0;
public static int prepare(Connection
conn, String statement)
throws SQLException {
jdbcConnection
connection = (jdbcConnection) conn;
Database database =
connection.dDatabase;
Session session =
connection.cSession;
Tokenizer c = new
Tokenizer(statement);
Parser p = new
Parser(database, c, session);
String sToken =
c.getString();
if (sToken.length() ==
0) {
Trace.error(Trace.UNEXPECTED_END_OF_COMMAND);
}
if
(!"SELECT".equals(sToken)) {
Trace.error(Trace.FUNCTION_NOT_SUPPORTED,"prepare only works for Select
statements");
}
Select select =
p.parseSelect();
prepareCount++;
hPreparedStatements.put(
new Integer(prepareCount), select);
return
prepareCount;
}
public static Object
execute(Connection conn, int i)
throws SQLException {
Select select = (Select)
hPreparedStatements.get( new Integer(i) );
if (select == null)
{
Trace.error(Trace.FUNCTION_NOT_SUPPORTED,"no prepared statement for " +
i);
}
return
select.getResult(0);
}
Please note that jdbcConnection had to be altered to make its dDatabase and
cSession (was cChannel until RC4) package private instead of private. In a In
production quality implementation, obviously some otherapproach should be
taken.
In Parser, the follwing modification was made:
Result processCall() throws SQLException
{
Expression e =
parseExpression();
e.resolve(null);
int type =
e.getDataType();
Object o =
e.getValue();
// changed to allow Library
methods to return Objects that are instances of Result, as would be required
to implement prepared statements inside the engine
if (o instanceof Result)
{
return
(Result)o;
}
Result r = new
Result(1);
r.sTable[0] = "";
r.colType[0] =
type;
r.sLabel[0] = "";
r.sName[0] = "";
Object row[] = new
Object[1];
row[0] = o;
r.add(row);
return r;
}
After compiling, I started up DatabaseManager and ran the following script: -->>>TEST<<<--
;
--#1000;
CREATE TABLE Test(
Id INTEGER PRIMARY KEY,
FirstName VARCHAR(20),
Name VARCHAR(50),
ZIP INTEGER) ;
CREATE TABLE Test2(
Id INTEGER PRIMARY KEY,
FirstName VARCHAR(20),
Name VARCHAR(50),
ZIP INTEGER) ;
INSERT INTO Test
VALUES(#,'Julia','Peterson-Clancy',#r#)
;
INSERT INTO Test2
VALUES(#,'Julia','Peterson-Clancy',#r#)
;
which fills the "zip" column in both tables with random values
After that, I ran the following script to generate a bunch of prepared
statements:
-->>>TEST<<<-- ;
--#1000;
call "org.hsqldb.Library.prepare"('select * from
test a, test2 b where a.zip = #r# and b.zip = a.zip')
Then I ran these scripts several times to compare performance:
-->>>TEST<<<--
;
--#1000;
select * from test a, test2 b where a.zip = #r# and b.zip = a.zip
Results:
9454 ms : select * from test a, test2 b where a.zip = #r# and b.zip =
a.zip
8963 ms : select * from test a, test2 b where a.zip = #r# and b.zip =
a.zip
8602 ms : select * from test a, test2 b where a.zip = #r# and b.zip =
a.zip
-->>>TEST<<<-- ;
--#1000;
call "org.hsqldb.Library.execute"(#+1)
Results:
9754 ms : call "org.hsqldb.Library.execute"(#+1)
9413 ms : call "org.hsqldb.Library.execute"(#+1)
9424 ms : call "org.hsqldb.Library.execute"(#+1)
9364 ms : call "org.hsqldb.Library.execute"(#+1)
9333 ms : call "org.hsqldb.Library.execute"(#+1)
As you can see, the overhead currently associated with parsing and
executing a "CALL" statement ruins any benefit obtained by accessing a prepared
(precompiled) select statement, even a moderately complex one.
The results for calling a very simple prepared statement, such as:
SELECT * FROM TEST WHERE ID = #
were even more dismal, with the total for 1000 iterations coming in at ~
1100 ms, vs. 420 - 480 ms for simply executing the select statements
directly.
Some of this can be attributed to the fact that we do not cache
java.lang.reflect.Methods when resolving CALL statements to reflective method
invocations, and some of it can be attributed, I suppose, to the fact that
reflective invocation is supposedly slower than direct invocation (although
there is significant evidence t oshow that it is really not very much slower in
the general scheme of things). But I think much can also be attributed to the
fact that each CALL statement must itself wend its way through Parser and
Expression, before finally getting it’s precompiled select statement from the
cache and executing it. This is very much unlike pervasive statement
caching, where the entire statement (possibly consisting of many actual atomic
SQL operations, suhc as insert, update, delete, select..) is used to do a hash
lookup for a precompiled (possibly collection of) statement(s) before *any*
parse is attempted, with a cahe hit resulting in the direct execution of the the
precompiled form, completely bypassing any parse at all.
So once again, the conclusion I keep coming to (a conlusion stated on Road
Map in a post of mine showing experimentally that ~ 60% of
statement processing time for the standard test script is spent in the
parsing phase, with ~20% in actual update, and another ~20% spent in logging)
that the single best and most direct way to improve the over-all performance of
the hsql database engine for short-running statements is to really focus mostly
on speeding up and making the parser more efficient, if possible.
Of course, for longer running statements, we will still need to introduce
optimization of query execution and possibly multithreaded execution for better
throughput under a mix of long and short running transactions...but that is a
ways off and requires a complete rearchitecting, whereas there are many things
we can do right now to improve parse performance.
All comments and suggestions welcome.
Cheers,
Campbell
|
- Re: [Hsqldb-developers] On The Subject Of Prepared... Campbell Boucher-Burnet
- Re: [Hsqldb-developers] On The Subject Of Pre... Fred Toussi
- Re: [Hsqldb-developers] On The Subject Of Pre... Karl Meissner
- Re: [Hsqldb-developers] On The Subject Of... fredt
- Re: [Hsqldb-developers] On The Subject Of... Campbell Boucher-Burnet
- Re: [Hsqldb-developers] On The Subjec... Karl Meissner
- Re: [Hsqldb-developers] On The Su... Campbell Boucher-Burnet
- [Hsqldb-developers] fine grain threading Karl Meissner
- Re: [Hsqldb-developers] fine grain thread... Fred Toussi
- Re: [Hsqldb-developers] fine grain th... Karl Meissner
- Re: [Hsqldb-developers] fine grai... Karl Meissner