[
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mamta A. Satoor updated DERBY-3788:
-----------------------------------
Attachment: DERBY3788_update_all_missing_stats_after_bind_patch4_diff.txt
DERBY3788_update_all_missing_stats_after_bind_patch4_stat.txt
I have an intermediate patch ready for review. The goal of the patch is to
detect early on in the compile phase of a SELECT query if there are tables
involved with missing statistics. If yes, then abandon the SELECT query
compilation, create the missing statistics and then try to compile the original
SELECT query again. This logic gets driven inside the
GenericStatement.prepMinion code. GenericStatement.prepMinion is responsible
for compile phase of a SQL. If the SQL involved is a SELECT statement, then we
may go through the compile process of the SELECT query twice. If the SQL
involved is non-SELECT, we will finish SQL compilation in the first pass. The
details about the code flow for SELECT statement compilation is as follows.
For a SELECT statement, we may have to go through the query compilation phase
twice. Whether we go through one cycle or 2 cycles of compilation phase depends
on wheter there are any tables involved in the query who have their statistics
missing. During the first attempt of query compile, we will make a list of all
the tables involved in the query during the bind phase of compile. After the
bind phase, during the first pass through the SELECT query compilation, we will
check if the query involved has any tables whose statistics are missing. If
yes, then we will quit from the SELECT compile phase by throwing missing
statistics exception. We will handle the missing stats exception by trying to
update the missing statistics. After that, we will go through the second pass
of original query compilation. During this second pass, we will not worry if
the statistics are missing or not. We will just work with whatever statistics
are available (one example where the statistics may still be missing after
trying to create them is say a user who only has select privileges on a table
finds the missing stats. But the statistics creation will fail because the user
does not have sufficient privileges to create the statistics. There can be
other reasons for statistics creation failure too. Another instance would be
that we can't get the necessary locks to update the stats. In such a case, we
will just quit out of updating the stats and move on to the second pass of the
original query compilation.
One of the issues with this patch I need to work on and will appreicate if
anyone has any feedback on. The update statistics is happening in the user
transaction. What that means is that any locks required to update statistics
will stay in place until the user transaction has been committed/rolled back.
This behavior can be an issue with user applications (since these are the locks
the user was not expecting to get as part of a SELECT query execution). This
behavior definitely causes problems with our tests where quite a handful of
tests run into locking issues because of the additional locks acquired before
of a SELECT query. I think one way to fix this would be to somehow run the
statistics in a nested transaction which can be committed after the statistics
are created. If we run into locking issues in the nested transaction then go
ahead and run the statistic in the user transaction. This is what we do for
GENERATED columns in InsertResultSet.getSetAutoincrementValue. One of the
problems that I need to address before I can use nested transaction is to
change the code that is written to fire statistics during SELECT compilation.
Currently, in my patch, I am executing the stored procedure
SYSCS_UTIL.SYSCS_UPDATE_STATISTICS which internally executes ALTER TABLE.. to
update the statistics. I am at too high a level to be able to use nested
transaction for updating the statistics (let me know if I am wrong here. May be
there is a way to use nested transaction while firing the
SYSCS_UTIL.SYSCS_UPDATE_STATISTICS that I am unaware of). I think I need to get
to update statistics code in AlterTableConstantAction directly somehow without
going through the stored procedure->ALTER TABLE.. sql path. I think if I can
directly call the update statistics in AlterTableConstantAction, then I can try
using the nested yransaction and if that does not succeed then user the user
transaction.
I will appreciate any input you might have on my overall approach to this jira
entry and then how to address the locking issue.
> Provide a zero-admin way of updating the statisitcs of an index
> ---------------------------------------------------------------
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
> Issue Type: New Feature
> Components: Performance
> Affects Versions: 10.5.0.0
> Reporter: Mamta A. Satoor
> Assignee: Mamta A. Satoor
> Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt,
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt,
> DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt,
> DERBY3788_update_all_missing_stats_after_bind_patch4_diff.txt,
> DERBY3788_update_all_missing_stats_after_bind_patch4_stat.txt,
> DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good
> for Derby to provide an automatic way of updating the statistics without
> requiring to run the stored procedure manually. There was some discussion on
> DERBY-269 about providing the 0-admin way. I have copied it here for
> reference.
> *********************
> Kathey Marsden - 22/May/05 03:53 PM
> Some sort of zero admin solution for updating statistics would be prefferable
> to the manual 'update statistics'
> *********************
> *********************
> Mike Matrigali - 11/Jun/08 12:37 PM
> I have not seen any other suggestions, how about the following zero admin
> solution? It is not perfect - suggestions welcome.
> Along with the statistics storing, save how many rows were in the table when
> exact statistics were calculated. This number is 0 if none have been
> calculated because index creation happened on an empty table. At query
> compile time when we look up statistics we automatically recalculate the
> statistics at certain threshholds - say something like row count growing past
> next threshhold : 10, 100, 1000, 100000 - with upper limit being somewhere
> around how many rows we can process in some small amount of time - like 1
> second on a modern laptop. If we are worried about response time, maybe we
> background queue the stat gathering rather than waiting with maybe some quick
> load if no stat has ever been gathered. The background gathering could be
> optimized to not interfere with locks by using read uncommitted.
> I think it would be useful to also have the manual call just to make it easy
> to support customers and debug issues in the field. There is proably always
> some dynamic data distribution change that in some case won't be picked up by
> the automatic algorithm. Also just very useful for those who have complete
> control of the create ddl, load data, run stats, deliver application process.
> *********************
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.