[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

Mamta A. Satoor (JIRA) Mon, 26 Jul 2010 09:19:17 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892343#action_12892343
 ]


Mamta A. Satoor commented on DERBY-3788:
----------------------------------------

Just wanted to summarize what has been tried for this jira so far and a 
proposal for a possible solution worth trying
1)First I tried to see if the missing stats could be created inline when the 
SELECT query compilation finds that missing but that failed because SELECT 
query compilation marked the data dictionary in read only mode where as update 
statistics needs to update the data dictionary(this marking of read only for 
data dictionary happens in GenericStatement.prepMinion). Because of this 
read-only marking, we can't create the statistics inline of the compilation 
phase of the SELECT query. 
2)Second approach was during the SELECT compile phase, if we come across 
missing stats, then schedule a task for later to update the stats. Finish the 
SELECT compilation and execution after the update stat task is scheduled on the 
queue (the scheduling mechanism used is only available with jdk 1.5 and 
higher). When the scheduled task was later fired, it would run no current 
connection exception. The reason for that was that there was no context set for 
update statistics to run. Tried resolving this by creating a connection context 
through following call InternalDriver.activeDriver().connect(url, info) I had 
the SELECT compilation pass the needed information to the background task so I 
can create the connection. This is pretty crude way of getting a connection to 
update the statistics. Along with the fact that the code to create the 
connection context was not very modular, some of the existing tests failed with 
this changes because of locking issues. This probably is because the locks held 
by the background thread for update stats interfered with locks required by the 
rest of the test. 
3)The third approach I tried was when the SELECT query detects missing stats, 
abort the query compilation, (thus removing the data dictionary from read-only 
mode), create the stats and then restart the original SELECT query compilation 
and execution. The hope with this approach was that it will avoid running into 
locking issues. If we decide to pursue this approach more, we need to make sure 
that we look at the possibility of stats still missing when the original query 
is compile the 2nd time. This can happen say because the user does not have 
privileges to update the stats, the update stat ran into locking issues. If we 
won't account for possibility of missing stats during the 2nd time through the 
original query, we can end up in an infinite loop. Another problem with this 
approach was that update stats were happening in the same transaction as the 
original SELECT query. Which means that the locks acquired by the update stat 
will not be released until the transaction is committed. Came across locking 
issues with this when I ran the existing junit and derbyall tests. One way to 
fix this could be to start a nested user transaction and do the update stat in 
that transaction and commit that transaction(I think something like this is 
done for identity columns. If yes, then we might find pointers there to use for 
update stats in nested user transaction). Thus any locks acquired by the update 
stat work will be released. And after that, go back to the original query again 
for compilation and execution. 
4)With any approach we take, we should detect the case where the table is empty 
and hence the stats might be missing. For such a case, we should not try to 
fire stats creation task. to rephrase, during the compile phase of the SELECT 
query, when we look for statistics, first check if the table is empty and if 
yes then skip the code for collecting stats. From what I recall, one can check 
if the table is empty by issuing open scan controller and asking it for the 
number of rows in the table. If we do not check for empty table, then we will 
end up in infinite loop. 
5)For read-only db we should detect that we are readonly and don't try to do 
update statistics 
6)Also, we probably want to create statistics only for user tables? Haven't 
thought enough about it. 
7)What if the user executing the query is not the owner of the table. should 
statistics still be created? I think if the user does not have enough 
privileges, then skip the step of creating stats since it's going to fail 
anyways. 
8)Once statistics are created, notify tabledescriptor about it because 
tabledescriptor  cahces available statistics and it will never know of the new 
statistics since tabledescriptor holds on to that cache. 
9)Once we have update stats working, another improvement could be to identify 
the cached compiled queries that may benefit from updated stats and have those 
queries recompile when they are executed next time. In other words, the 
relevant queries should be invalidated as part of the statistics collection 
task. 
10)Look at other scenarios(in addition to step 7) above) where we know that 
update stats might fail. For those cases, don't spend time trying to create the 
stats. Instead just let the optimizer work with the information it has at the 
moment to come up with the query plan. 

> Provide a zero-admin way of updating the statisitcs of an index
> ---------------------------------------------------------------
>
>                 Key: DERBY-3788
>                 URL: https://issues.apache.org/jira/browse/DERBY-3788
>             Project: Derby
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 10.5.1.1
>            Reporter: Mamta A. Satoor
>            Assignee: Mamta A. Satoor
>         Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, 
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, 
> DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt, 
> DERBY3788_update_all_missing_stats_after_bind_patch4_diff.txt, 
> DERBY3788_update_all_missing_stats_after_bind_patch4_stat.txt, 
> DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *********************
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *********************
> *********************
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 100000 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *********************

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

Reply via email to