----- Original Message ----- From: "Karl Meissner" <[EMAIL PROTECTED]> To: "Campbell Boucher-Burnet" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Monday, April 08, 2002 2:54 PM Subject: Re: [Hsqldb-developers] On The Subject Of Prepared And Callable Statements:
> >As a simple example, let's say we implement the idea > >above and build PREPARE and EXECUTE into the syntax > > We are definitely talking past each other. > > I don't think PREPARE should be an extension of the > syntax (in SQL). I think that PreparedStatement (in > java.sql.* ) interface should be supported. > But all of the JDBC interface implementations must eventually talk SQL to the database *or* they must make direct calls, in which case something approximating RMI must be used to allow this to occur for network connections. Personally, I prefer straight SQL for network connections, since it will be very hard to add and support something approximating RMI, and I really doubt there would be a great performance advantage under something like that anyway. OTOH, we could do something like simply continuing the current approach for network connections, but supply direct methods in Database for preparing and executing prepared statements, to be used if the iType of the connection was detected to be STANDALONE or MEMORY. What do you think? I will write a quick test for this in the next day or two. This is basically how my metadata patch works in jdbcDatabaseMetaData: if a local connection is detected, a direct call to a DIJdbcLibrary method is performed, whereas if a network connection is detected, the corresponding CALL statement is made, invoking the same method of DJdbcLibrary, but on the version that the database is using across the network. > > >I believe you must be talking about a special, > > HSQL-native, non-SQL, > > non-JDBC protocol/interface. In that case, there > > Read the link. Please. This is part of java.sql from > Sun. It is standard. Here is a better link > http://java.sun.com/j2se/1.4/docs/api/java/sql/PreparedStatement.html Yes, I am well aware of the interface. In addition to altering the current parse/execute mechanism to produce internal command objects with an exec method instead of executing statements in-line with Parsing as we do now, all that it would take to implement this really efficiently is for us to add the method: Database.createPreparedStatement(String statement, Session session) This kind of PreparedStatement would be an inner class of Database, allowing us great freedom and access while still avoiding having to open up the internals to an external class. However, be aware that there is a substantial amount of work waiting for someone who wants to move on this. Not only must Delete, Insert, and Update classes be written, but also much code must be written to identify, access, nad maniplulate "?" parameters. At this point, there is absolutely nothing in Parser, Expression, TableFilter, etc.yet to handle parameterized SQL. Indeed, this is why I have limited my experiments strictly to static select statements, simply taking advantage of the fact that we do currently have a Select class, as well as a method in Parser (parseSelect) that returns Select objects ready for (multiple) execution. In my original article, I also tested a niave implementation of a Delete object. The code is posted with that article at: http://sourceforge.net/forum/forum.php?thread_id=633995&forum_id=73673 In all of this, I originally realized that a great amount of work remains to be done to add these sort of features to HSQL. Indeed, that is why I have been doing these advance experiments: I have been trying to determine if the expected performance increase is worth the great deal of work. How disapointing it would be if, after months of hard work, it turned out that performance gains were only minimal. > > - if prepared statments are created on a database > > instance across the > > network, > > In terms of network overhead on a distributed system, > it should be no different then a normal Statement > object. Maybe less. > implementation-specific. Certainly, there are many approaches that could be taken to try to ensure that it would be less in most cases. > > - there must still be a mechanism to submit a SQL > > statement to be prepared, > > Sent a query as a string to the connection. > > Look this is how you could use it in a Java app > to bulk load some records > //---------------------------------------------- > PreparedStatement pstmt = con.prepareStatement("UPDATE > EMPLOYEES SET SALARY = ? WHERE ID = ?"); > > // no more HSQL parsing from this point on > //.....get some data..... > > for( i=0; i<1000000; ++i) { // a 60% speed up is huge > pstmt.setBigDecimal(1, red_salary( i ) ); > pstmt.setInt(2, read_id( i ) ); > pstmt.execute(); > } Yes, but if the execute is inside the loop like this (rather than using batching), then the new supplied parameters must be used to resolve the precomplied statment again and again, each time pstmt.execute is called. To get really high performance, one would need to rewrite the loop like: for( i=0; i<1000000; ++i) { // a 60% speed up is huge pstmt.setBigDecimal(1, red_salary( i ) ); pstmt.setInt(2, read_id( i ) ); pstmt.addbatch(); } pstmt.execute(); And this would only increase performance if our command objects (Select, Insert, Update, Delete) were written such that host arrays (implemnented as a vector, most likely) could be used to set the batch values for each parameter. That and obviously our Database inner class PreparedStatement we written to build up such things in response to addBatch and pass them accordingly. > > precompiled statements that is shorter than > > retransmitting their body, once > Prepared statements should be a list of atomic > operation objects. The can stored locally or in an > indexed registry in the server. Server side storage > may be faster but more complicated to implement. True enough. Actually, with the suggested implemntation above, the need to reference engine-side PreparedStatment is eliminated. > > - once we have a handle to a precompiled statement, > > we must have a mechanism > > to indicate to the engine to (re)execute the > > precompiled statement to which > > the handle refers. As I see it, we have two > > choices: use CALL > > EXECUTE(handle) syntax, or to build EXECUTE directly > > into the syntax: > > EXECUTE handle. > > Well this part is the interesting part :-) > The output of the parse should be a list of objects. > Each object represents a single atomic operation on > the db. In a Statement they are executed immediately. > In a prepared statement they are stored. > Yes. allowing Parsing a whole SQL document to a list of precompiled statments would of course present some additional challenges, especailly for a collection of parameterized statements submitted at once to be encapsulated as a single PreparedStatement, since additional code would have to be put in place to map external parameter positions to both a list position and a parameter position in the item at that list position. Perhaps to start out, Preparing shold throw if more than one atomic statement is submitted. > > > Every SQL statmement is atomic w.r.t. every other > > statements in the broadest > > sense, so I assume you really mean "low-level > > atomic" here. > > > > The only things that I can think of that are low > > level atomic operations in > > hsqldb are insertNoCheck and deleteNoCheck in that > > Also the Expression objects Oops. True enough, I guess, although looked from another level, Expressions are composed. > > > each represents witing a > > single line to the log. All other actions are > > composed and are typically > > tree (rather than list) oriented. > > Any tree recursion can be converted to a list with a > call stack. But that is not too important. The > details of the data structure used, whether list or > tree, should only matter to the internals of the Parse > and the Statment. > > > If one captures the resulting > > low level atomic > > operations that processing such a tree outputs, it > > is highly unlikely that > > re-executing that exact list in absense of > > consulting the tree and tables > > used to gerenrate it would have the same result, > > since the low level > > operations may be required to refer to rows that are > > themselves were deleted > > I think we are talking past each other again. Yes. I think your atomic operations are things like insert, update, and delete which can affect multiple rows, whereas mine are things like the individual insertNoCheck and deleteNoCheck calls that make up performing top level insert, update, and delete operations. > > Envision bytecode with a stack. The bytecode are > atomic operations. The stack holds what ever partial > result sets are generated as operation are processed. > > The operations for a particular query should not > change, only the inputs, data and stack. > > >Also, if new parameters for a precompiled > >statement are substituted, then the previously > >generated list > >definitely > >becomes invalid. > > The list of operation is Not invalid. > The stack of partial results would be discarded each > execution. Ok, we really are talking past each other. If the list of operations are top-level insert, update, deletes, then yes, but in that case there is no shared stack of partial results (unless we go totally crazy and implement something like at the link I provided on Roadmap to Views 'R Us: http://www.cs.umd.edu/~nick/papers/ViewsRus.html Which reminds me: What the hell happened to the RoadMap section of the forums? I sure hope that stuff didn't get deleted. I have several articles and my reading list posted there, all of which I have not backed up here. On the other hand, if we are talking about the individual operations that make up executing a statement (the tree of relational operators that make up the execution pipeline, often refered to as the PLAN, for an individual top level statement), then: 1.) the tuples in the pipleine during a previous execution may have been removed from their containing relations or modified by the time the next execution occurs, meaning that they no longer satisfy the predicates of the statment 2) changing the parameters changes the predicates and thus may invalidate the PLAN itself, when we introduce an optimizer, requiring the PLAN to be rebuilt. 3.) changing the parameters may require complex Expressions to be resolved again In any case, there is not really a stack of partial results shared between top level statements, and no execution dependent things, such as gathered tuples, should hang about inside the PLAN for each top level statment between executions. > >Well, I think this is more difficult than one might > expect. > > Ok. > It is not easy. > > > >To provide new values to a precompiled parameterized > >statement, one must provide a language for soing so, > >and thus that > > No no no. please read the Sun API and all will be > clear. > All is clear, and was clear before. What I said was: To provide new values to a precompiled parameterized statement, one must provide a language for soing so, and thus that langauge itself must also be parsed or at least some internal interface to do this must be provided, above and beyond what JDBC provides on the client side of things.. By language, I simply meant an a WAY, a set of steps, some code and methods abvoe and beyond the empty Sun interface. I guess I used the term LANGUAGE too loosely here, causing confusion. And I think I made "this" an ambiguos pronoun. By "to do this", I meant provide values, not parse a syntax extention to SQL. Sorry for being unclear. As far as the extentions to the SQL I provided in the examples, my intention was to use such things as an illustration only, since it is generally easier to write some SQL pseudo code than to detail some Java interfaces and implementations. Hower, if it were decided to allow efficient acces to such features from SQL scripts, then I think SQL syntax extentions would have to be prefered over CALLs, based on the performance I observed in the last test. However, it may be that CALLs are actually much slower than they need to be, based on a discussion I just had with Fred. 1.) It turns out, for instance, that CALLs to methods with Connection as the first parameter create a new Connection instance for each call to Function.getValue(), instead of simply creating a single INTERNAL connection for each Session object (was Channel, now renamed). 2.) Method objects are not cached, so each parse of a CALL instantiates a new Method object corresponding to the stored procedure. 3.) When INTERNAL connections are finalized, they call closeStandalone(), which is wrong (Fred refers to this as a Phony). As such, every finalization or explicit close for INTERNAL connection objects throws an NPE . This is because INTERNAL connections have a null database name and one of the first actions of closeStandalone is to try to get() the database corresponding to the connection's database name. Anyway, based on these observations, there are probably many places where we can speed up processing of CALL statements, although I realize this is off topic w.r.t. your frustrated "no,no.no" exclamation.. > > >Unfortunately, the actual number of cases where this > >is applicable in the > >context of processing client (as opposed to internal) > >work is not as high as one might like. > > I can't speak for what other developer do, but on > applications that I have implemented, like web pages > backed by JSP backed by a DB, there are a few (under > 10 ) queries that account for 80% to 90% of the > activity on the DB. Typically the form of the query > stays the same but the input values changes. > > example > SELECT userid FROM login_tlb WHERE passwd='blah' Yes, but commercial databases typically also use transparent, pervasive statement caching, so once a statement is parsed the first time, it is generally looked up thereafter if possible, for the duration of a session or at least until booted out of the cache for some reason; hence my findng that the performance differential is not that great most of the time. Plus, under typical OLTP loads, such statements are more likely to be spread out over time, rather than executed in a large batch at once, so the perceived performance gains are nominal. Also, if a statment processes in 1 ms, one does not perceive a slowdown until > 1000 of them are submitted per second. Only then does one even notice the performance gains realized by using precompiled statements over submitting plain SQL. And if the statement takes, say, 20 ms to process, eliminating the parse overhead and saving, say, .6 ms on the parse time, represents only 3% saving. Indeed, if a statment takes seconds to process, bypassing the parse of course has no savings, relatively speaking. So the use of prepared statements has a rather narrow window of opportunity overall to make a big difference. Only when executing a great number of very short running statements is this likely to provide any noticable improvements. As such, one is far more likely to solve JSP performance problems via server clustering and expanding the cluster than by the liberal use of PreparedStatments. In fact, I'd wager that the overhead of processing the http requests and round-trips between the webserver and the back end would always be very large compared to the time saved by using PreparedStatements. So, even though one might be saving 40-60% per statement on the back end, the actual saving per http request would be nominal. > Take 60% of the load off these queries would be huge. > > >I have found using precompiled Oracle cursors and > >stored procedures to be most useful > >when providing a structured and garanteed interface > >to an applciation, by turning off ad-hoc SQL access > >to a schema, forcing everyone to go > > I agree 100%. I just think that is a very common, > useful situation. But maybe not for your app. > *shrug* > > >I do not see how providing any fine-grain control > >over the log process can avoid logging > > these. Plus, regardless of what one might do to > > improve the speed of the > > log, I have determined that ~80% of the time > > is spent in the lowest level > > (inside the actual low-level calls to write out > > to disk) and has very little > > to do with the volume of data written in each > > call, depending more > > Exactly. Which why we need better control over > logging. I meant the low level calls to the NATIVE write method in FileOutputStream. That is an area over which we have no control at all, I think, and is highly unlikely to be helped even by JNI code, at least without turning hsql(db) into a Java interface to a mostly native code DBMS. > > There are many applications where speed is more > important then data recovery in the event of a crash. > For those applications a 'lazy write back' would be > much more efficient. Only write dirty pages on cache > flush or when the sytem is idle. I admit you would > not do bank transactions this way but it would work > great for a video game that kept game objects in a db. > In that case, you could probably save quite a bit with close to no extra work by using a MEMORY mode database. To load and save game objects, simply add the ability to explicitly load persist individual tables using your memory table bulk load/save code, no? upon attaning each level, simply save out the latest version of the table tracking game progress??? > > >In the larger scheme, this represents only a 10% > >reduction to taken in > >logging (since ~80% is beyond our control), or around > >a 2% savings overall. > > Not if we can drive parsing to zero ;-) > > > >. Which would you rather work on? > > Well I like data stuctures a lot... Perhaps you could work with Fred then on restructuring the code base so that we use Delete, Update, and Insert objects encapsulating the data structures needed to execute these operations, instead of performing the executions in-line with parsing? This would go a long way toward providing the performance increases we would all like to see via PreparedStatements .............................................. > : Meissner Software Development, LLC : > : "Software development and project management services." : > : http://meissner.v0.net/resume.htm : > ........................................................... > > __________________________________________________ > Do You Yahoo!? > Yahoo! Tax Center - online filing with TurboTax > http://taxes.yahoo.com/ _______________________________________________ hsqldb-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/hsqldb-developers