[
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671349#action_12671349
]
Mamta A. Satoor commented on DERBY-3788:
----------------------------------------
Currently, I am pursuing running the missing statistics inline in the compile
phase of a SELECT query. The compile phase code is driven by
GenericStatement.prepMinion.
My first attempt was to run the update statistics inline in the optimize phase
of a SELECT sql (because in the optimize phase of a query, we look for
statistics for determining the query plan for execution and that is when we
know that the statistics are missing) but there I ran into data dictionary
being in read-only mode. In GenericStatement.prepMinion method, at line 307, we
mark the data dictionary(DD) as read-only for the duration of bind and optimize
phase and hence statistics can't be updated because DD has been marked readonly
at this time.
At line 422, after the optimize phase, we mark the DD as done reading and hence
DD is not in read-only mode during the generate phase. Based on this, I tried
running the statistics inline in the generate phase of SELECT query to
establish the fact that the statistics can indeed be run in-;ine. I ran into
only one coding issues DERBY-4048 (about AlterTableConstantAction not using
execute transaction for updating the statistics). After resolving DERBY-4048, I
am able to run the update statistics inline in the generate phase of a SELECT
query for a simple test case as shown below
create table t1 (c11 int, c12 char);
create index i1 on t1(c11);
insert into t1 values (1,'1'),(2,'2'),(3,'3');
-- no statistics will be found for t1.i1 because table was empty when index i1
was created
select * from sys.sysstatistics;
-- during the generate phase of following query, we will run update statistics
inline for t1.i1
select * from t1 where c11=1;
-- this time, there will be statistics for t1.i1
select * from sys.sysstatistics;
Note, that there were no locking issues/privilege issues involved in the simple
test case above. But when update statistics runs into some problem, the
question is how do we just revert back the update statistics work and let the
original SELECT query proceed with execution. When I ran junit, towards the
very beginning, I saw lot of problems related to not having enough privileges,
context stack issue etc because of update statistics.
I think the right approach would be to start with the compile phase of SELECT
query and during optimization, when we find that the statistics are missing,
then quit the compilation of SELECT query there, run the update statistics and
refire the compilation of the SELECT query. This way, update statistics will
not run into any locking issues with SELECT query. I am going to try to spend
some time working on this approach and see how it goes. Would appreicate if
anyone has any feedback.
> Provide a zero-admin way of updating the statisitcs of an index
> ---------------------------------------------------------------
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
> Issue Type: New Feature
> Components: Performance
> Affects Versions: 10.5.0.0
> Reporter: Mamta A. Satoor
> Assignee: Mamta A. Satoor
> Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt,
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt,
> DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt, DERBY_3788_Mgr.java,
> DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good
> for Derby to provide an automatic way of updating the statistics without
> requiring to run the stored procedure manually. There was some discussion on
> DERBY-269 about providing the 0-admin way. I have copied it here for
> reference.
> *********************
> Kathey Marsden - 22/May/05 03:53 PM
> Some sort of zero admin solution for updating statistics would be prefferable
> to the manual 'update statistics'
> *********************
> *********************
> Mike Matrigali - 11/Jun/08 12:37 PM
> I have not seen any other suggestions, how about the following zero admin
> solution? It is not perfect - suggestions welcome.
> Along with the statistics storing, save how many rows were in the table when
> exact statistics were calculated. This number is 0 if none have been
> calculated because index creation happened on an empty table. At query
> compile time when we look up statistics we automatically recalculate the
> statistics at certain threshholds - say something like row count growing past
> next threshhold : 10, 100, 1000, 100000 - with upper limit being somewhere
> around how many rows we can process in some small amount of time - like 1
> second on a modern laptop. If we are worried about response time, maybe we
> background queue the stat gathering rather than waiting with maybe some quick
> load if no stat has ever been gathered. The background gathering could be
> optimized to not interfere with locks by using read uncommitted.
> I think it would be useful to also have the manual call just to make it easy
> to support customers and debug issues in the field. There is proably always
> some dynamic data distribution change that in some case won't be picked up by
> the automatic algorithm. Also just very useful for those who have complete
> control of the create ddl, load data, run stats, deliver application process.
> *********************
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.