Re: [Hsqldb-developers] On The Subject Of Prepared And Callable Statements:

Campbell Boucher-Burnet Mon, 08 Apr 2002 12:33:43 -0700


----- Original Message -----
From: "Karl Meissner" <[EMAIL PROTECTED]>
To: "Campbell Boucher-Burnet" <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Sent: Monday, April 08, 2002 7:20 AM
Subject: Re: [Hsqldb-developers] On The Subject Of Prepared And Callable
Statements:



>
> I think we may be talking past each other.
>
> I believe you are envisioning a usage model when all
> the SQL is executed from a script.   That is an
> important model and you may be correct about that
> usage model.
>
> Another way to use HSQL is from within a java
> application or JSP.  In that case you can connect
> directly to the db engine and take advantage of JDBC.
> I think in that case prepared statements have a *lot*
> of value.   There is no need to use "CALL" then.

I believe you must be talking about a special, HSQL-native, non-SQL,
non-JDBC protocol/interface.  In that case, there *may* be performance
advantages...I guess.  But that sort-of defeats the whole idea of being a
SQL RDBMS with a JDBC interface, no?

- to bypass a CALL, we would still have have to build PREPARE and EXECUTE
into the SQL syntax or add bunch of "non-standard" stuff into the jdbc
protocol, all of which still requires processing, in addition to actually
executing precompiled statments

- if prepared statments are created on a database instance across the
network, rather than on a local embedded database instance, there still must
be some sort of protocol as opposed to say direct Database method calls, the
most natural still being the HSQL native network protocol transporting SQL
to the engine and Results back from the engine.  Using a separate proptocol
is very unlikely to be of any advantage here, and would just add bulk to the
code base and complicate the range of possible entry points into the
execution engine.  RMI is one such avenue and is simple.  But it is also
slow for aout purposes, as I recall.

- there must still be a mechanism to submit a SQL statement to be prepared,
and the part of a statement indicating to prepare a contained statement is
likely to be very small and fast to parse or process, compared to the
contained statement itself.

- there must also still be a mechanism for looking up (refering to)
precompiled statements that is shorter than retransmitting their body, once
we have submitted them for precompilation.  Typically, this is done by
returning from the interal prepare call/statement code an integer handle
with which future execution of the precompiled statment can be specified.

- once we have a handle to a precompiled statement, we must have a mechanism
to indicate to the engine to (re)execute the precompiled statement to which
the handle refers.  As I see it, we have two choices:  use CALL
EXECUTE(handle) syntax, or to build EXECUTE directly into the syntax:
EXECUTE handle.  The latter may indeed end up being faster.  However, I will
not have time to set this up and test it for a couple of weeks now, as I
have commited to Fred to help out with the RC4 javadocs update and other
things...

There is typically no way to bypass SQL itself.  The engine is a SQL engine
after all, and the system as a whole supports remote (network) clients, so
short of RMI/RPC directly to native engine calls (which I suspect would be
no more efficent, really),  such things are best to leave going through the
SQL interface to the engine, as this garantees a consistent handling of all
external interaction with the database:  SQL in, Results out.  Plus, I think
for a large number of calls, RMI/RPC directly to native engine calls would
be a game of pay me now or pay me later.  Perhaps the time spent by the
engine itself would be lower, but the overall time would probably be the
same, given that a bunch of marshalling and unmarshalling would still be
required and most likely take the bulk of the time.

> > First, a CALL must be made to prepare the statement,
> > saving it on the server under a handle (say CALL
> > prepare(sql) returns int), then another CALL must be
> > made to execute the prepared statement (say CALL
> > execute(int) returns Result). As such, the call to
> > execute the prepared statement must itself be
> > parsed,
>
> Check out the API for java.sql.PreparedStatement
> http://java.sun.com/j2se/1.4/docs/api/index.html
>
> The parser is called *once*, when the
> PreparedStatement is first constructed in the
> Connection.preparedStatement
> After that simply set the parameters  and call
> executeQuery().

I believe if you take a look at my script, you will see that the "prepare"
for each statment is executed just once, thereafter calling execute on the
resulting precompiled statement.

I did not include in the previous post that the time for parsing and
executing ther requests to prepare 1000 statements was actually quite low,
at about 2.5 seconds.

Obviously, for my little test I did not include the ability to create
prepared statments with IN/OUT parameters.  I simply "stole" the fact that a
full (non-parametric) select statment is already represented by instances of
the Select class (insert/update/deletes are executed in-line with Parsing,
and as such cannot be trivially precompiled and stored without first
creating classes for them as well)

> Implemented correctly,
> PreparedStatement.executeQuery() should be list of
> atomic operation on the db.  Each operation is an
> object which supports an exec(  ) function.
> Just go over the list and excecute each operation.

Every SQL statmement is atomic w.r.t. every other statements in the broadest
sense, so I assume you really mean "low-level atomic" here.

The only things that I can think of that are low level atomic operations in
hsqldb are insertNoCheck and deleteNoCheck in that each represents witing a
single line to the log.  All other actions are composed and are typically
tree (rather than list) oriented.  In the abstract sense (since we actually
do some ofthis n-line mixed with parsing), executing select, insert, update,
and delete statements consists of building a tree of nodes composed of
TableFilter, Select, Expression, etc. objects and visitng them all, building
results and modifying in-memory structures and possbily recording individual
steps in the log..  If one captures the resulting low level atomic
operations that processing such a tree outputs, it is highly unlikely that
re-executing that exact list in absense of consulting the tree and tables
used to gerenrate it would have the same result, since the low level
operations may be required to refer to rows that are themselves were deleted
during the first execution.  Also, if new parameters for a precompiled
statement are substituted, then the previously generated list definitely
becomes invalid.

To me, the appropriate level of data structure at which to store precompiled
statements is at the level of composed objects representing entire
precompiled top-level select, insert, update, and delete statments.  But
these cannot be represented as lists; their most natural representation is
tree-like.

> This would be ideal for a threaded web server app that
> must connect to the db and execute a small set of
> queries with a few parameters over and over.  It
> greatly reduces the overhead of parsing to almost
> nothing.
>
>
> > performing a CALL uses Java reflection to match the
> > procedure name with a java.lang.reflect.Method,
> I dont see the need for using Reflection at all if we
> use PreparedStatement.
>
>
> >As you can see, the overhead currently associated
> >with parsing and executing a "CALL" statement ruins
> >any benefit obtained by accessing a prepared
> >(precompiled) select statement, even a moderately
> >complex one.
>
>
> >~ 60% of statement processing time for the standard
> >test script is spent in the parsing phase,
>
> Ouch!  that whole thing can be optimized away.   At
> least when you in a JSP or Java application with a
> simple set of queries.

Well, I think this is more difficult than one might expect.

To provide new values to a precompiled parameterized statement, one must
provide a language for soing so, and thus that langauge itself must also be
parsed or at least some internal interface to do this must be provided,
above and beyond what JDBC provides on the client side of things..

As a simple example, let's say we implement the idea above and build PREPARE
and EXECUTE into the syntax, bypassing the need to use CALLs

PREPARE INSERT INTO mytable(id, data) VALUES(?,?) -- returns the value "1"
to the caller, a handle to the precompiled statement

Obviously, we cannot just issue EXECTUE 1;  we need to specifiy the unknown
values.  So, let's say we introduce a naive little language extention to do
so:

WITH 1 set 1 = someid, 2 = somedata;
EXECTUE 1;

WITH 1 set 1 = some_other_id, 2 = some_other_data;
EXECTUE 1;

Ack!  Now we must actually parse and execute two statements on the engine
side of things each time we would have originally parsed and executed one
statment on the engine side (rather than just recreating the SQL using the
new supplied parameters on the JDBC side adn resubmitting the new SQL to the
engine, as we do now).

This is where batching and host arrays come in:

PREPARE INSERT INTO mytable(id, data) VALUES(?,?) -- returns the value "1"
to the caller, a handle to the precompiled statement
CREATE HOST ARRAY id_array(integer) AS (someid,...,the_last_id)
CREATE HOST ARRAY data_array(varchar) AS (somedata, ...,the_last_data)
EXECUTE 1 USING id_array FOR 1, data_array FOR 2

This is very efficient, since there are only a few parses (or, if direct
calls or RMI were provided, only a small number of method invocations) for a
potentially large amount of work, and a large number of executions take
place in the last statement, bypassing a large number of parses (method
calls) to execute the target INSERT statment, as well as reducing network
traffic in a network client scenario.

However, parsing (making calls for) the second statement and third
statements (creating and filling the host arrays) might actually take a fair
amount of time, so even this form might not be as fast as one might like,
although certainly far more efficient overall for bulk operations.

Unfortunately, the actual number of cases where this is applicable in the
context of processing client (as opposed to internal) work is not as high as
one might like.  Your MEMORY table bulk load/store code is a good example of
where interal work can be sped up very substantially and made more efficient
using such concepts, but I do not think such things translate quite as well
to more general use-cases that must go through the public, standardized
client interfaces (or even proprietary RMI): usually one cannot control when
one might like to call a precompiled statement using just as single vector
of parameter values, thus requiring two (and possibly more using JDBC, where
changing each parameter value requires an additional call) steps:
specifying the parameter vector and  then specifying to re-execute the
precompiled statement with the new parameter values.

My experience with commercial databases seems to back this up.  I have
performed tests using strictly plain SQL, comaring this to using strictly
precompiled cursors and stored procedures to do the same thing on Oracle 7
and 8i.  The differences in terms of overall time taken to perform
indentical tasks have alway turned out to be rather minimal.  I have found
using precompiled Oracle cursors and stored procedures to be most useful
when providing a structured and garanteed interface to an applciation, by
turning off ad-hoc SQL access to a schema, forcing everyone to go through
the stored procedures instead ( as in, revoke all on (all SQL, such as
select, insert, update, delete) from (all users)...grant execute on (my
stored procedures) to (my list of users or allowed roles for using the
application)).

> >with ~20% in actual update,
>
> >and another ~20% spent in logging
> We should have more fine grain control of logging.

The only things that are really logged (besides DDL) are single-row inserts
and single row deletes (i.e. inserts strictly of the form "insert into table
values(list for all column values)" and deletes strictly of the form
specifying a predicate of equality to the primary key).  I do not see how
providing any fine-grain control over the log process can avoid logging
these.  Plus, regardless of what one might do to improve the speed of the
log, I have determined that ~80% of the time is spent in the lowest level
(inside the actual low-level calls to write out to disk) and has very little
to do with the volume of data written in each call, depending more on the
absolute number of calls (and we really need to call write for each log
line), so even if the surrounding code is sped up dramatically (its not
doing that much as it is) and the volume of data written to the log is
reduced (say, as was my original idea to reduce operation specifiers and
obejct names to numeric ids), the best we could hope for is 100% reduction
in that portion.  However, lets be more realistic and say that with a great
deal of work and thought we could reduce the time spent in our logging code
(we cannot reduce the ime in natvie calls to write to disk, though) by 50%.
In the larger scheme, this represents only a 10% reduction to taken in
logging (since ~80% is beyond our control), or around a 2% savings overall.
OTOH, all we need to get a similar overal savings is to improve the time
taken in parsing by about 3-4%.  Which would you rather work on?

> >most direct way to improve the over-all performance
> >of the hsql database engine for short-running
> >statements is to really focus mostly on speeding up
> >and making the parser more efficient, if possible.
>
> Of course this is good goal.
>
>
>
>
>
>
> =====
> Karl Meissner
>
> ...........................................................
> : Meissner Software Development, LLC                      :
> : "Software development and project management services." :
> : http://meissner.v0.net/resume.htm                       :
> ...........................................................
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Tax Center - online filing with TurboTax
> http://taxes.yahoo.com/


_______________________________________________
hsqldb-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/hsqldb-developers

Re: [Hsqldb-developers] On The Subject Of Prepared And Callable Statements:

Reply via email to