Re: [Hsqldb-developers] On The Subject Of Prepared And Callable Statements:

Campbell Boucher-Burnet Mon, 08 Apr 2002 18:51:25 -0700


----- Original Message -----
From: "Karl Meissner" <[EMAIL PROTECTED]>
To: "Campbell Boucher-Burnet" <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Sent: Monday, April 08, 2002 2:54 PM
Subject: Re: [Hsqldb-developers] On The Subject Of Prepared And Callable
Statements:



> >As a simple example, let's say we implement the idea
> >above and build  PREPARE and EXECUTE into the syntax
>
> We are definitely talking past each other.
>
> I don't think PREPARE should be an extension of the
> syntax (in SQL).  I think that PreparedStatement (in
> java.sql.* ) interface should be supported.
>

But all of the JDBC interface implementations must eventually talk SQL to
the database *or* they must make direct calls, in which case something
approximating RMI must be used to allow this to occur for network
connections.

Personally, I prefer straight SQL for network connections, since it will be
very hard to add and support something approximating RMI, and I really doubt
there would be a great performance advantage under something like that
anyway.

OTOH, we could do something like simply continuing the current approach for
network connections, but supply direct methods in Database for preparing and
executing prepared statements, to be used if the iType of the connection was
detected to be STANDALONE or MEMORY.  What do you think?  I will write a
quick test for this in the next day or two.   This is basically how my
metadata patch works in jdbcDatabaseMetaData:  if a local connection is
detected, a direct call to a DIJdbcLibrary method is performed, whereas if a
network connection is detected, the corresponding CALL statement is made,
invoking the same method of DJdbcLibrary, but on the version that the
database is using across the network.

>
> >I believe you must be talking about a special,
> > HSQL-native, non-SQL,
> > non-JDBC protocol/interface.  In that case, there
>
> Read the link.  Please.  This is part of java.sql from
> Sun.  It is standard.   Here is a better link
> http://java.sun.com/j2se/1.4/docs/api/java/sql/PreparedStatement.html

Yes, I am well aware of the interface.  In addition to altering the current
parse/execute mechanism to produce internal command objects with an exec
method instead of executing statements in-line with Parsing as we do now,
all that it would take to implement this really efficiently is for us to add
the method:

Database.createPreparedStatement(String statement, Session session)

This kind of PreparedStatement would be an inner class of Database, allowing
us great freedom and access while still avoiding having to open up the
internals to an external class.

However, be aware that there is a substantial amount of work waiting for
someone who wants to move on this.  Not only must Delete, Insert, and Update
classes be written, but also much code must be written to identify, access,
nad maniplulate "?" parameters.  At this point, there is absolutely nothing
in Parser, Expression, TableFilter, etc.yet to handle parameterized SQL.
Indeed, this is why I have limited my experiments strictly to static select
statements, simply taking advantage of the fact that we do currently have a
Select class, as well as a method in Parser (parseSelect) that returns
Select objects ready for (multiple) execution.

In my original article, I also tested a niave implementation of a Delete
object.  The code is posted with that article at:
http://sourceforge.net/forum/forum.php?thread_id=633995&forum_id=73673

In all of this, I originally realized that a great amount of work remains to
be done to add these sort of features to HSQL.  Indeed, that is why I have
been doing these advance experiments:  I have been trying to determine if
the expected performance increase is worth the great deal of work.  How
disapointing it would be if, after months of hard work, it turned out that
performance gains were only minimal.

> > - if prepared statments are created on a database
> > instance across the
> > network,
>
> In terms of network overhead on a distributed system,
> it should be no different then a normal Statement
> object.  Maybe less.
>

implementation-specific.  Certainly, there are many approaches that could be
taken to try to ensure that it would be less in most cases.

> > - there must still be a mechanism to submit a SQL
> > statement to be prepared,
>
> Sent a query as a string to the connection.
>
> Look this is how you could use it in a Java app
> to bulk load some records
> //----------------------------------------------
> PreparedStatement pstmt = con.prepareStatement("UPDATE
> EMPLOYEES SET SALARY = ? WHERE ID = ?");
>
> // no more HSQL parsing from this point on
> //.....get some data.....
>
> for( i=0; i<1000000; ++i) { // a 60% speed up is huge
>    pstmt.setBigDecimal(1, red_salary( i ) );
>    pstmt.setInt(2, read_id( i ) );
>    pstmt.execute();
> }



Yes, but if the execute is inside the loop like this (rather than using
batching), then the new supplied parameters must be used to resolve the
precomplied statment again and again, each time pstmt.execute is called.

To get really high performance, one would need to rewrite the loop like:

for( i=0; i<1000000; ++i) { // a 60% speed up is huge
    pstmt.setBigDecimal(1, red_salary( i ) );
    pstmt.setInt(2, read_id( i ) );
    pstmt.addbatch();
 }
pstmt.execute();

And this would only increase performance if our command objects (Select,
Insert, Update, Delete) were written such that host arrays (implemnented as
a vector, most likely) could be used to set the batch values for each
parameter.  That and obviously our Database inner class PreparedStatement we
written to build up such things in response to addBatch and pass them
accordingly.



> > precompiled statements that is shorter than
> > retransmitting their body, once
> Prepared statements should be a list of atomic
> operation objects.  The can stored locally or in an
> indexed registry in the server.   Server side storage
> may be faster but more complicated to implement.


True enough.  Actually, with the suggested implemntation above, the need to
reference engine-side PreparedStatment is eliminated.


> > - once we have a handle to a precompiled statement,
> > we must have a mechanism
> > to indicate to the engine to (re)execute the
> > precompiled statement to which
> > the handle refers.  As I see it, we have two
> > choices:  use CALL
> > EXECUTE(handle) syntax, or to build EXECUTE directly
> > into the syntax:
> > EXECUTE handle.
>
> Well this part is the interesting part :-)
> The output of the parse should be a list of objects.
> Each object represents a single atomic operation on
> the db.  In a Statement they are executed immediately.
>  In a prepared statement they are stored.
>

Yes.  allowing Parsing a whole SQL document to a list of precompiled
statments would of course present some additional challenges, especailly for
a collection of parameterized statements submitted at once to be
encapsulated as a single PreparedStatement, since additional code would have
to be put in place to map external parameter positions to both a list
position and a parameter position in the item at that list position.

Perhaps to start out, Preparing shold throw if more than one atomic
statement is submitted.

>
> > Every SQL statmement is atomic w.r.t. every other
> > statements in the broadest
> > sense, so I assume you really mean "low-level
> > atomic" here.
> >
> > The only things that I can think of that are low
> > level atomic operations in
> > hsqldb are insertNoCheck and deleteNoCheck in that
>
> Also the Expression objects

Oops.  True enough, I guess, although looked from another level, Expressions
are composed.

>
> > each represents witing a
> > single line to the log.  All other actions are
> > composed and are typically
> > tree (rather than list) oriented.
>
> Any tree recursion can be converted to a list with a
> call stack. But that is not too important. The
> details of the data structure used, whether list or
> tree, should only matter to the internals of the Parse
> and the Statment.
>
> > If one captures the resulting
> > low level atomic
> > operations that processing such a tree outputs, it
> > is highly unlikely that
> > re-executing that exact list in absense of
> > consulting the tree and tables
> > used to gerenrate it would have the same result,
> > since the low level
> > operations may be required to refer to rows that are
> > themselves were deleted
>
> I think we are talking past each other again.

Yes.  I think your atomic operations are things like insert, update, and
delete which can affect multiple rows, whereas mine are things like the
individual insertNoCheck and deleteNoCheck calls that make up performing top
level insert, update, and delete operations.

>
> Envision bytecode with a stack.  The bytecode are
> atomic operations.  The stack holds what ever partial
> result sets are generated as operation are processed.
>
> The operations for a particular query should not
> change, only the inputs, data and stack.
>
> >Also, if new parameters for a precompiled
> >statement are substituted, then the previously
> >generated list
> >definitely
> >becomes invalid.
>
> The list of operation is Not invalid.
> The stack of partial results would be discarded each
> execution.

Ok, we really are talking past each other.  If the list of operations are
top-level insert, update, deletes, then yes, but in that case there is no
shared stack of partial results (unless we go totally crazy and implement
something like at the link I provided on Roadmap to Views 'R Us:
http://www.cs.umd.edu/~nick/papers/ViewsRus.html

Which reminds me:  What the hell happened to the RoadMap section of the
forums?  I sure hope that stuff didn't get deleted.  I have several articles
and my reading list posted there, all of which I have not backed up here.

On the other hand, if we are talking about the individual operations that
make up executing a statement (the tree of relational operators that make up
the execution pipeline, often refered to as the PLAN, for an individual top
level statement), then:

1.) the tuples in the pipleine during a previous execution may have been
removed from their containing relations or modified by the time the next
execution occurs, meaning that they no longer satisfy the predicates of the
statment
2) changing the parameters changes the predicates and thus may invalidate
the PLAN itself, when we introduce an optimizer, requiring the PLAN to be
rebuilt.
3.) changing the parameters may require complex Expressions to be resolved
again

In any case, there is not really a stack of partial results shared between
top level statements, and no execution dependent things, such as gathered
tuples, should hang about inside the PLAN for each top level statment
between executions.

> >Well, I think this is more difficult than one might
> expect.
>
> Ok.
> It is not easy.
>
>
> >To provide new values to a precompiled parameterized
> >statement, one  must provide a language for soing so,
> >and thus that
>
> No no no.  please read the Sun API and all will be
> clear.
>

All is clear, and was clear before.

What I said was:

    To provide new values to a precompiled parameterized statement, one must
    provide a language for soing so, and thus that langauge itself must also
be
    parsed or at least some internal interface to do this must be provided,
    above and beyond what JDBC provides on the client side of things..

By language, I simply meant an a WAY, a set of steps, some code and methods
abvoe and beyond the empty Sun interface.

I guess I used the term LANGUAGE too loosely here, causing confusion.

And I think I made "this" an ambiguos pronoun.  By "to do this", I meant
provide values, not parse a syntax extention to SQL.  Sorry for being
unclear.

As far as the extentions to the SQL I provided in the examples, my intention
was to use such things as an illustration only, since it is generally easier
to write some SQL pseudo code than to detail some Java interfaces and
implementations.

Hower, if it were decided to allow efficient acces to such features from SQL
scripts,  then I think SQL syntax extentions would have to be  prefered over
CALLs, based on the performance I observed in the last test.

However, it may be that CALLs are actually much slower than they need to be,
based on a discussion I just had with Fred.

1.) It turns out, for instance, that CALLs to methods with Connection as the
first parameter create a new Connection instance for each call to
Function.getValue(), instead of simply creating a single INTERNAL connection
for each Session object (was Channel, now renamed).

2.) Method objects are not cached, so each parse of a CALL instantiates a
new Method object corresponding to the stored procedure.

3.) When INTERNAL connections are finalized, they call closeStandalone(),
which is wrong (Fred refers to this as a Phony).
As such, every finalization or explicit close for INTERNAL connection
objects throws an NPE .  This is because INTERNAL connections have a null
database name and one of the first actions of closeStandalone is to try to
get() the database corresponding to the connection's database name.

Anyway, based on these observations, there are probably many places where we
can speed up processing of CALL statements, although I realize this is off
topic w.r.t. your frustrated "no,no.no" exclamation..

>
> >Unfortunately, the actual number of cases where this
> >is applicable in  the
> >context of processing client (as opposed to internal)
> >work is not as  high as one might like.
>
> I can't speak for what other developer do, but on
> applications that I have implemented, like web pages
> backed by JSP backed by a DB, there are a few (under
> 10 ) queries that account for 80% to 90% of the
> activity on the DB.  Typically the form of the query
> stays the same but the input values changes.
>
> example
> SELECT userid FROM login_tlb WHERE passwd='blah'

Yes, but commercial databases typically also use transparent, pervasive
statement caching, so once a statement is parsed the first time, it is
generally looked up thereafter if possible, for the duration of a session or
at least until booted out of the cache for some reason; hence my findng that
the performance differential is not that great most of the time.  Plus,
under typical OLTP loads, such statements are more likely to be spread out
over time, rather than executed in a large batch at once, so the perceived
performance gains are nominal.  Also, if a statment processes in 1 ms, one
does not perceive a slowdown until > 1000 of them are submitted per second.
Only then does one even notice the performance gains realized by using
precompiled statements over submitting plain SQL.  And if the statement
takes, say, 20 ms to process, eliminating the parse overhead and saving,
say, .6 ms on the parse time, represents only 3% saving.  Indeed, if a
statment takes seconds to process, bypassing the parse of course has no
savings, relatively speaking.  So the use of prepared statements has a
rather narrow window of opportunity overall to make a big difference.  Only
when executing a great number of very short running statements is this
likely to provide any noticable improvements.  As such, one is far more
likely to solve JSP performance problems via server clustering and expanding
the cluster than by the liberal use of PreparedStatments.  In fact, I'd
wager that the overhead of processing the http requests and round-trips
between the webserver and the back end would always be very large compared
to the time saved by using PreparedStatements.  So, even though one might be
saving 40-60% per statement on the back end, the actual saving per http
request would be nominal.

> Take 60% of the load off these queries would be huge.
>
> >I have found using precompiled Oracle cursors and
> >stored procedures to be most  useful
> >when providing a structured and garanteed interface
> >to an applciation,  by turning off ad-hoc SQL access
> >to a schema, forcing everyone to go
>
> I agree 100%.  I just think that is a very common,
> useful situation.   But maybe not for your app.
> *shrug*
>
> >I do not see  how providing any fine-grain control
> >over the log process can avoid logging
> > these.  Plus, regardless of what one might do to
> > improve the speed of  the
> > log, I have determined that ~80% of the time
> > is spent in the lowest  level
> > (inside the actual low-level calls to write out
> > to disk) and has very  little
> > to do with the volume of data written in each
> > call, depending more
>
> Exactly.  Which why we need better control over
> logging.

I meant the low level calls to the NATIVE write method in FileOutputStream.
That is an area over which we have no control at all, I think, and is highly
unlikely to be helped even by JNI code, at least without turning hsql(db)
into a Java interface to a mostly native code DBMS.

>
> There are many applications where speed is more
> important then data recovery in the event of a crash.
> For those applications a 'lazy write back' would be
> much more efficient.  Only write dirty pages on cache
> flush or when the sytem is idle.   I admit you would
> not do bank transactions this way but it would work
> great for a video game that kept game objects in a db.
>

In that case, you could probably save quite a bit with close to no extra
work by using a MEMORY mode database.  To load and save game objects, simply
add the ability to explicitly load persist individual tables using your
memory table bulk load/save code, no?  upon attaning each level, simply save
out the latest version of the table tracking game progress???

>
> >In the larger scheme, this represents only a 10%
> >reduction to taken in
> >logging (since ~80% is beyond our control), or around
> >a 2% savings overall.
>
> Not if we can drive parsing to zero ;-)
>
>
> >.  Which would you rather work on?
>
> Well I like data stuctures a lot...

Perhaps you could work with Fred then on restructuring the code base so that
we use Delete, Update, and Insert objects encapsulating the data structures
needed to execute these operations, instead of performing the executions
in-line with parsing?  This would go a long way toward providing the
performance increases we would all like to see via PreparedStatements







..............................................
> : Meissner Software Development, LLC                      :
> : "Software development and project management services." :
> : http://meissner.v0.net/resume.htm                       :
> ...........................................................
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Tax Center - online filing with TurboTax
> http://taxes.yahoo.com/


_______________________________________________
hsqldb-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/hsqldb-developers

Re: [Hsqldb-developers] On The Subject Of Prepared And Callable Statements:

Reply via email to