I think any solution that requires less interaction from the client
is better and fits better with the zero admin goal of Derby. I wonder
if it would be good to take the "store/retrieve" statement cache idea
one step further.
Rather than just persist the statement cache, instead create an indexed
table that acts as the backing store for the statement cache. Seems
like it should be sized bigger than the in memory statement cache and
the statement cache could then be changed to fault things from it when
it got a miss.
some problems:
1) probably don't want to store infinite plans, there are too many bad
apps out there that for instance every insert is actually a different
query. So seems like it would be good to have a max size on the on
disk cache.
2) maybe want to limit types of plans to look for. Some options might
be only select plans with N or greater tables in from list? Or maybe
only plans that took longer than N ms. to compile (should figure out
what the overhead of doing a lookup and serializing in a plan from disk
is).
3) It would be nice to not slow down existing compile of queries too
much. Again what is the cost of a lookup in the on disk table for a
miss?
4) Need to make sure validation of these plans work. This was an area
that caused a number of corrupt databases in cloudscape. The basic
problem was that we would use a stored plan incorrectly when it should
have been invalidated because some sort of ddl event happened since the
original compile. Easiest way for this to lead to a corrupt db is that
an index was added, and the old compiled plan did not know about it and
thus when an insert happened the new index would not get an entry
leading to a mismatch between new index and table. This is not an
issue for system stored plans that get invalidated every release. There
must be a mechanism for invalidated the trigger ones, but I don't know
if existing invalidation will work in all cases for non trigger use.
Knut Anders Hatlen wrote:
Nick Puz <[email protected]> writes:
Hi Knut,
Thanks for the response, I'll take a look into the cloudscape docs and
how execute is implemented. I'll also look more at the statement cache
to see about persisting it on shutdown and restoring on startup, so
that then it could be done globally for all statements (as an
option). Another alternative would be to let the user specify which to
persist as a comment in the sql (as is done with the optimizer
overrides to specify index usage, etc.), possibly as a different
runtime option. The advantage of the first (global) is that apps could
take advantage with no sql changes. Perhaps persist the N most
recently used statements...
Hi Nick,
I agree that no SQL changes required is an advantage. Applications could
start taking advantage of it immediately with no changes (except perhaps
setting a flag to enable the persisting of the statement cache), and no
new, non-standard syntax is required.
As to using comments, that would require changes in the SQL, but the SQL
would still be portable since the comments would simply be ignored by
other DMBSs.
Adding the ability to store user-defined prepared statements with some
special syntax is probably the easiest to implement, since most of the
infrastructure is already in place, but then the applications would need
to use unportable SQL. (Although we don't actually need to add new
syntax. We already have EXECUTE STATEMENT, and we could just create a
system procedure to store the statements, something like CALL
SYSCS_UTIL.SYSCS_STORE_STATEMENT('APP', 'MYSTMT', 'SELECT * FROM
T'). But even if the syntax is portable, the actual statement will still
be tied to Derby.)
So, not considering the complexity of the different implementations, a
persistent statement cache and/or use of comments sounds more attractive
to me.
If we go for persisting the statement cache, I think it would be easier
to just store the entire cache than to store the N most recently used,
both because Derby's statement cache currently doesn't have a notion
about "most recently used", and because the statement cache normally is
so small (25 statement by default, I think) as I wouldn't think the
extra complexity is worthwhile.
In terms of invalidating, a crude but simple way would be to have the
version of derby stored with the prepared statements and just to
discard them if the version numbers differ. The same mechanism coudl
be used as when deciding to recompile the system stored
statements. What do you think?
Reusing as much as possible of the existing code would be good. So
basically what needs to be done is
1) On shutdown, if persistent cache is enabled, store each of the
cached statements in SYS.SYSSTATEMENTS.
2) On startup, move the stored statements from SYS.SYSSTATEMENTS back
into the cache, and remove them from SYS.SYSSTATEMENTS.
3) On upgrade, Derby will automatically invalidate all compiled plans
for statements in SYS.SYSSTATEMENTS, so no need for any special
handling for the persisted cached statements (though, strictly
speaking it would be better to drop those statements than to fill the
cache with uncompiled statements).
4) On soft-upgrade, I think it might be a good idea to disable
persisting of the cache entirely, since we wouldn't want to end up
with SYS.SYSSTATEMENTS containing statements compiled for different
versions of Derby.
I think (must check to be sure) that in the current code, Derby needs to
know whether or not the statement should be a stored prepared statement
or an ordinary statement when the statement is compiled, and that an
entry is added in the SYS.SYSSTATEMENTS table before the statement is
compiled. If this is done for all statements, even the ones that won't
still be in the cache when the database shuts down, there will be an
extra, unnecessary compile-time cost for those statements.
This could be stated as an expected overhead caused by enabling the
persistent cache, but it also makes a solution where the users
explicitly say which statements they want to persist attractive. That
again leaves us with the question of how to dispose of them, since we
surely don't want to keep all the different variations over the same
statement that different versions of an application have used. Removing
such statements from SYS.SYSSTATEMENTS when they are evicted from the
statement cache is one option, I would guess.
Another perhaps better option if we want to store the entire statement
cache, is to find a way to convert a non-storable statement to a
storable statement when we decide that it should be stored. Don't know
how difficult that would be.
Besides version changes the statements should probably be
removed/regenerated if the statistics used to determine the execution
plan change. How does derby manage the statistics, are they
automatically kept up to date or does the user have to do something
specific? Do you happen to know what happens with the stored trigger
actions, since they should have the same problem. I'll take a look at
the code to try to figure it out in the next few days but just curious
if you or another derby dev knows...
There is an undocumented (doc request logged as DERBY-3134) property,
derby.language.stalePlanCheckInterval, which tells how often the plan
for a prepared statement should be checked. If the plan is found to be
stale (not sure if it looks at cardinality statistics or if it just
checks the estimated size of the tables involved) a recompile is
initiated. If the statement is an SPS, the new plan is also written back
to the system table.