----- Original Message ----- From: "Karl Meissner" <[EMAIL PROTECTED]> To: "Campbell Boucher-Burnet" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Monday, April 08, 2002 7:20 AM Subject: Re: [Hsqldb-developers] On The Subject Of Prepared And Callable Statements:
> > I think we may be talking past each other. > > I believe you are envisioning a usage model when all > the SQL is executed from a script. That is an > important model and you may be correct about that > usage model. > > Another way to use HSQL is from within a java > application or JSP. In that case you can connect > directly to the db engine and take advantage of JDBC. > I think in that case prepared statements have a *lot* > of value. There is no need to use "CALL" then. I believe you must be talking about a special, HSQL-native, non-SQL, non-JDBC protocol/interface. In that case, there *may* be performance advantages...I guess. But that sort-of defeats the whole idea of being a SQL RDBMS with a JDBC interface, no? - to bypass a CALL, we would still have have to build PREPARE and EXECUTE into the SQL syntax or add bunch of "non-standard" stuff into the jdbc protocol, all of which still requires processing, in addition to actually executing precompiled statments - if prepared statments are created on a database instance across the network, rather than on a local embedded database instance, there still must be some sort of protocol as opposed to say direct Database method calls, the most natural still being the HSQL native network protocol transporting SQL to the engine and Results back from the engine. Using a separate proptocol is very unlikely to be of any advantage here, and would just add bulk to the code base and complicate the range of possible entry points into the execution engine. RMI is one such avenue and is simple. But it is also slow for aout purposes, as I recall. - there must still be a mechanism to submit a SQL statement to be prepared, and the part of a statement indicating to prepare a contained statement is likely to be very small and fast to parse or process, compared to the contained statement itself. - there must also still be a mechanism for looking up (refering to) precompiled statements that is shorter than retransmitting their body, once we have submitted them for precompilation. Typically, this is done by returning from the interal prepare call/statement code an integer handle with which future execution of the precompiled statment can be specified. - once we have a handle to a precompiled statement, we must have a mechanism to indicate to the engine to (re)execute the precompiled statement to which the handle refers. As I see it, we have two choices: use CALL EXECUTE(handle) syntax, or to build EXECUTE directly into the syntax: EXECUTE handle. The latter may indeed end up being faster. However, I will not have time to set this up and test it for a couple of weeks now, as I have commited to Fred to help out with the RC4 javadocs update and other things... There is typically no way to bypass SQL itself. The engine is a SQL engine after all, and the system as a whole supports remote (network) clients, so short of RMI/RPC directly to native engine calls (which I suspect would be no more efficent, really), such things are best to leave going through the SQL interface to the engine, as this garantees a consistent handling of all external interaction with the database: SQL in, Results out. Plus, I think for a large number of calls, RMI/RPC directly to native engine calls would be a game of pay me now or pay me later. Perhaps the time spent by the engine itself would be lower, but the overall time would probably be the same, given that a bunch of marshalling and unmarshalling would still be required and most likely take the bulk of the time. > > First, a CALL must be made to prepare the statement, > > saving it on the server under a handle (say CALL > > prepare(sql) returns int), then another CALL must be > > made to execute the prepared statement (say CALL > > execute(int) returns Result). As such, the call to > > execute the prepared statement must itself be > > parsed, > > Check out the API for java.sql.PreparedStatement > http://java.sun.com/j2se/1.4/docs/api/index.html > > The parser is called *once*, when the > PreparedStatement is first constructed in the > Connection.preparedStatement > After that simply set the parameters and call > executeQuery(). I believe if you take a look at my script, you will see that the "prepare" for each statment is executed just once, thereafter calling execute on the resulting precompiled statement. I did not include in the previous post that the time for parsing and executing ther requests to prepare 1000 statements was actually quite low, at about 2.5 seconds. Obviously, for my little test I did not include the ability to create prepared statments with IN/OUT parameters. I simply "stole" the fact that a full (non-parametric) select statment is already represented by instances of the Select class (insert/update/deletes are executed in-line with Parsing, and as such cannot be trivially precompiled and stored without first creating classes for them as well) > Implemented correctly, > PreparedStatement.executeQuery() should be list of > atomic operation on the db. Each operation is an > object which supports an exec( ) function. > Just go over the list and excecute each operation. Every SQL statmement is atomic w.r.t. every other statements in the broadest sense, so I assume you really mean "low-level atomic" here. The only things that I can think of that are low level atomic operations in hsqldb are insertNoCheck and deleteNoCheck in that each represents witing a single line to the log. All other actions are composed and are typically tree (rather than list) oriented. In the abstract sense (since we actually do some ofthis n-line mixed with parsing), executing select, insert, update, and delete statements consists of building a tree of nodes composed of TableFilter, Select, Expression, etc. objects and visitng them all, building results and modifying in-memory structures and possbily recording individual steps in the log.. If one captures the resulting low level atomic operations that processing such a tree outputs, it is highly unlikely that re-executing that exact list in absense of consulting the tree and tables used to gerenrate it would have the same result, since the low level operations may be required to refer to rows that are themselves were deleted during the first execution. Also, if new parameters for a precompiled statement are substituted, then the previously generated list definitely becomes invalid. To me, the appropriate level of data structure at which to store precompiled statements is at the level of composed objects representing entire precompiled top-level select, insert, update, and delete statments. But these cannot be represented as lists; their most natural representation is tree-like. > This would be ideal for a threaded web server app that > must connect to the db and execute a small set of > queries with a few parameters over and over. It > greatly reduces the overhead of parsing to almost > nothing. > > > > performing a CALL uses Java reflection to match the > > procedure name with a java.lang.reflect.Method, > I dont see the need for using Reflection at all if we > use PreparedStatement. > > > >As you can see, the overhead currently associated > >with parsing and executing a "CALL" statement ruins > >any benefit obtained by accessing a prepared > >(precompiled) select statement, even a moderately > >complex one. > > > >~ 60% of statement processing time for the standard > >test script is spent in the parsing phase, > > Ouch! that whole thing can be optimized away. At > least when you in a JSP or Java application with a > simple set of queries. Well, I think this is more difficult than one might expect. To provide new values to a precompiled parameterized statement, one must provide a language for soing so, and thus that langauge itself must also be parsed or at least some internal interface to do this must be provided, above and beyond what JDBC provides on the client side of things.. As a simple example, let's say we implement the idea above and build PREPARE and EXECUTE into the syntax, bypassing the need to use CALLs PREPARE INSERT INTO mytable(id, data) VALUES(?,?) -- returns the value "1" to the caller, a handle to the precompiled statement Obviously, we cannot just issue EXECTUE 1; we need to specifiy the unknown values. So, let's say we introduce a naive little language extention to do so: WITH 1 set 1 = someid, 2 = somedata; EXECTUE 1; WITH 1 set 1 = some_other_id, 2 = some_other_data; EXECTUE 1; Ack! Now we must actually parse and execute two statements on the engine side of things each time we would have originally parsed and executed one statment on the engine side (rather than just recreating the SQL using the new supplied parameters on the JDBC side adn resubmitting the new SQL to the engine, as we do now). This is where batching and host arrays come in: PREPARE INSERT INTO mytable(id, data) VALUES(?,?) -- returns the value "1" to the caller, a handle to the precompiled statement CREATE HOST ARRAY id_array(integer) AS (someid,...,the_last_id) CREATE HOST ARRAY data_array(varchar) AS (somedata, ...,the_last_data) EXECUTE 1 USING id_array FOR 1, data_array FOR 2 This is very efficient, since there are only a few parses (or, if direct calls or RMI were provided, only a small number of method invocations) for a potentially large amount of work, and a large number of executions take place in the last statement, bypassing a large number of parses (method calls) to execute the target INSERT statment, as well as reducing network traffic in a network client scenario. However, parsing (making calls for) the second statement and third statements (creating and filling the host arrays) might actually take a fair amount of time, so even this form might not be as fast as one might like, although certainly far more efficient overall for bulk operations. Unfortunately, the actual number of cases where this is applicable in the context of processing client (as opposed to internal) work is not as high as one might like. Your MEMORY table bulk load/store code is a good example of where interal work can be sped up very substantially and made more efficient using such concepts, but I do not think such things translate quite as well to more general use-cases that must go through the public, standardized client interfaces (or even proprietary RMI): usually one cannot control when one might like to call a precompiled statement using just as single vector of parameter values, thus requiring two (and possibly more using JDBC, where changing each parameter value requires an additional call) steps: specifying the parameter vector and then specifying to re-execute the precompiled statement with the new parameter values. My experience with commercial databases seems to back this up. I have performed tests using strictly plain SQL, comaring this to using strictly precompiled cursors and stored procedures to do the same thing on Oracle 7 and 8i. The differences in terms of overall time taken to perform indentical tasks have alway turned out to be rather minimal. I have found using precompiled Oracle cursors and stored procedures to be most useful when providing a structured and garanteed interface to an applciation, by turning off ad-hoc SQL access to a schema, forcing everyone to go through the stored procedures instead ( as in, revoke all on (all SQL, such as select, insert, update, delete) from (all users)...grant execute on (my stored procedures) to (my list of users or allowed roles for using the application)). > >with ~20% in actual update, > > >and another ~20% spent in logging > We should have more fine grain control of logging. The only things that are really logged (besides DDL) are single-row inserts and single row deletes (i.e. inserts strictly of the form "insert into table values(list for all column values)" and deletes strictly of the form specifying a predicate of equality to the primary key). I do not see how providing any fine-grain control over the log process can avoid logging these. Plus, regardless of what one might do to improve the speed of the log, I have determined that ~80% of the time is spent in the lowest level (inside the actual low-level calls to write out to disk) and has very little to do with the volume of data written in each call, depending more on the absolute number of calls (and we really need to call write for each log line), so even if the surrounding code is sped up dramatically (its not doing that much as it is) and the volume of data written to the log is reduced (say, as was my original idea to reduce operation specifiers and obejct names to numeric ids), the best we could hope for is 100% reduction in that portion. However, lets be more realistic and say that with a great deal of work and thought we could reduce the time spent in our logging code (we cannot reduce the ime in natvie calls to write to disk, though) by 50%. In the larger scheme, this represents only a 10% reduction to taken in logging (since ~80% is beyond our control), or around a 2% savings overall. OTOH, all we need to get a similar overal savings is to improve the time taken in parsing by about 3-4%. Which would you rather work on? > >most direct way to improve the over-all performance > >of the hsql database engine for short-running > >statements is to really focus mostly on speeding up > >and making the parser more efficient, if possible. > > Of course this is good goal. > > > > > > > ===== > Karl Meissner > > ........................................................... > : Meissner Software Development, LLC : > : "Software development and project management services." : > : http://meissner.v0.net/resume.htm : > ........................................................... > > __________________________________________________ > Do You Yahoo!? > Yahoo! Tax Center - online filing with TurboTax > http://taxes.yahoo.com/ _______________________________________________ hsqldb-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/hsqldb-developers