[jira] Updated: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

Mamta A. Satoor (JIRA) Wed, 27 Aug 2008 11:40:14 -0700

     [ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mamta A. Satoor updated DERBY-3788:
-----------------------------------

    Attachment: DERBY_3788_Repro.java
                DERBY_3788_Mgr.java

Based on the feedback from Mike, I am thinking of pursuing option 3)"give up on 
updating during compile and just use default, but schedule update after compile 
and somehow mark query to be recompiled again the next time (this might 
automatically happen if update of statistic already causes dependency to change 
on compiled query - i don't know. "

I will concentrate on the first part of option 3 which is to schedule update 
statistics after query compile if the query compile finds that the required 
statistics are not available. As recommended by Dan on Derby dev list 
(http://www.nabble.com/higher-level-background-work-in-the-derby-server,-where-should-it-go--td18950929.html),
 I am trying to use classes in java.util.concurrent rather than some home-built 
background jobs mechanism.

To break option 3) first part further into mini-steps, I have written a "Hello 
World" program using the java.util.concurrent package. This is totally out of 
the Derby code, just stand alone JAVA program. There are 2 physical java 
files(attached to this jira entry for reference). One(DERBY_3788_Mgr) creates 
the ThreadPoolExecutor and then adds 30 Hello World background tasks 
(DERBY_3788_Repro). In order to execute this, use following
java org.apache.derbyTesting.functionTests.tests.jdbc4.DERBY_3788_Mgr

One thing to note is that java.util.concurrent.ThreadPoolExecutor was 
introduced in jdk1.5 and hence need to run with jdk1.5 or higher version to run 
this program. We will have to discuss what do we want to do for jdks prior to 
1.5

As the next mini-step, I am going to see how I can put this ThreadPoolExecutor 
framework in our codeline when we detect that statistics are not available. At 
this step, I will just do some sort of println in the background thread for 
update statistics requirement. Once that is in place, I will try to see how I 
can actually fire update statistics code from the background threads.

Any comments/feedback on what I have so far or what the plan is?


> Provide a zero-admin way of updating the statisitcs of an index
> ---------------------------------------------------------------
>
>                 Key: DERBY-3788
>                 URL: https://issues.apache.org/jira/browse/DERBY-3788
>             Project: Derby
>          Issue Type: New Feature
>          Components: Performance
>    Affects Versions: 10.5.0.0
>            Reporter: Mamta A. Satoor
>            Assignee: Mamta A. Satoor
>         Attachments: DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *********************
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *********************
> *********************
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 100000 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *********************

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

Reply via email to