[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

Mamta A. Satoor (JIRA) Fri, 06 Feb 2009 14:01:25 -0800

    [ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671349#action_12671349
 ]


Mamta A. Satoor commented on DERBY-3788:
----------------------------------------

Currently, I am pursuing running the missing statistics inline in the compile 
phase of a SELECT query. The compile phase code is driven by 
GenericStatement.prepMinion. 

My first attempt was to run the update statistics inline in the optimize phase 
of a SELECT sql (because in the optimize phase of a query, we look for 
statistics for determining the query plan for execution and that is when we 
know that the statistics are missing) but there I ran into data dictionary 
being in read-only mode. In GenericStatement.prepMinion method, at line 307, we 
mark the data dictionary(DD) as read-only for the duration of bind and optimize 
phase and hence statistics can't be updated because DD has been marked readonly 
at this time. 

At line 422, after the optimize phase, we mark the DD as done reading and hence 
DD is not in read-only mode during the generate phase. Based on this, I tried 
running the statistics inline in the generate phase of SELECT query to 
establish the fact that the statistics can indeed be run in-;ine. I ran into 
only one coding issues DERBY-4048 (about AlterTableConstantAction not using 
execute transaction for updating the statistics). After resolving DERBY-4048, I 
am able to run the update statistics inline in the generate phase of a SELECT 
query for a simple test case as shown below
create table t1 (c11 int, c12 char);
create index i1 on t1(c11);
insert into t1 values (1,'1'),(2,'2'),(3,'3');
-- no statistics will be found for t1.i1 because table was empty when index i1 
was created
select * from sys.sysstatistics;
-- during the generate phase of following query, we will run update statistics 
inline for t1.i1
select * from t1 where c11=1;
-- this time, there will be statistics for t1.i1
select * from sys.sysstatistics;

Note, that there were no locking issues/privilege issues involved in the simple 
test case above. But when update statistics runs into some problem, the 
question is how do we just revert back the update statistics work and let the 
original SELECT query proceed with execution. When I ran junit, towards the 
very beginning, I saw lot of problems related to not having enough privileges, 
context stack issue etc because of update statistics.

I think the right approach would be to start with the compile phase of SELECT 
query and during optimization, when we find that the statistics are missing, 
then quit the compilation of SELECT query there, run the update statistics and 
refire the compilation of the SELECT query. This way, update statistics will 
not run into any locking issues with SELECT query. I am going to try to spend 
some time working on this approach and see how it goes. Would appreicate if 
anyone has any feedback.

> Provide a zero-admin way of updating the statisitcs of an index
> ---------------------------------------------------------------
>
>                 Key: DERBY-3788
>                 URL: https://issues.apache.org/jira/browse/DERBY-3788
>             Project: Derby
>          Issue Type: New Feature
>          Components: Performance
>    Affects Versions: 10.5.0.0
>            Reporter: Mamta A. Satoor
>            Assignee: Mamta A. Satoor
>         Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, 
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, 
> DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt, DERBY_3788_Mgr.java, 
> DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *********************
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *********************
> *********************
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 100000 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *********************

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

Reply via email to