[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

Mamta A. Satoor (JIRA) Thu, 28 Aug 2008 12:40:05 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626713#action_12626713
 ]


Mamta A. Satoor commented on DERBY-3788:
----------------------------------------

As per my comment on this jira entry yesterday, I am trying to use 
ThreadPoolExecutor in Derby codeline so it can be used to fire the update 
statistics tasks in the background. ThreadPoolExecutor was added as part of 
jdk1.5 When I include the import of this class into DataDictionary.java(I will 
refer to it as DD), I ofcourse need to make sure that the DD.java gets compiled 
with 1.5 and higher. But at run time, requiring the user to have 1.5 or higher 
is not going to work. 

At this point, I am thinking of addressing this by having a subclass of DD, 
called say DD5 which will be loaded by the monitor code if we know that we are 
dealing with jdk 1.5 or higher. For the earlier versions, *for now*, trying to 
schedule update statistics tasks in the background will be a no-op. This way, I 
will be able to have the new code run(rather do no-op for jdk1.4 and lower) in 
all supported versions of jdks. If anyone has any feedback on my approach, 
please let me know.

Some background information : iapi.sql.dictionary.TableDescriptor has an 
existing method called statisticsExist which will return true if the statistics 
exist. This method gets called by the query optimization phase along with 
AlterTableConstantAction. For the query optimization phase, we want to be able 
to schedule update statistics tasks in the background if 
TableDescriptor.statisticsExist () returns false. Following is the pseudo-code 
I have in mind:

during the query optimization 
    if (TableDescriptor.statisticsExist == false) {
        TableDescriptor.createStatisticsInBackGround(schemaname, tablename, 
indexname) <----- new method in TableDescriptor
    }

TableDescriptor with new method
    createStatisticsInBackGround(.....) {
          DD.scheduleUpdateStatisticstask(...) <------ new method in DD. 
     }

DD5 will do the real work of scheduling the taks in background. In DD(ie when 
running with jdk1.4 and lower), this method will be no-op.

DD5 with new method (for jdk1.5 and higher)
    scheduleUpdateStatisticstask(...) {
           create ThreadPoolExecutor if does not already exist
           queue update statistics task on the ThreadPoolExecutor
    }

DD with new method (for jdk1.4 and lower)
    scheduleUpdateStatisticstask(...) {
            no-op
    }


> Provide a zero-admin way of updating the statisitcs of an index
> ---------------------------------------------------------------
>
>                 Key: DERBY-3788
>                 URL: https://issues.apache.org/jira/browse/DERBY-3788
>             Project: Derby
>          Issue Type: New Feature
>          Components: Performance
>    Affects Versions: 10.5.0.0
>            Reporter: Mamta A. Satoor
>            Assignee: Mamta A. Satoor
>         Attachments: DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *********************
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *********************
> *********************
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 100000 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *********************

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

Reply via email to