[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mamta A. Satoor updated DERBY-3788:
-----------------------------------

    Attachment: DERBY3788_update_all_missing_stats_after_bind_patch4_diff.txt
                DERBY3788_update_all_missing_stats_after_bind_patch4_stat.txt

I have an intermediate patch ready for review. The goal of the patch is to 
detect early on in the compile phase of a SELECT query if there are tables 
involved with missing statistics. If yes, then abandon the SELECT query 
compilation, create the missing statistics and then try to compile the original 
SELECT query again. This logic gets driven inside the 
GenericStatement.prepMinion code. GenericStatement.prepMinion is responsible 
for compile phase of a SQL. If the SQL involved is a SELECT statement, then we 
may go through the compile process of the SELECT query twice. If the  SQL 
involved is non-SELECT, we will finish SQL compilation in the first pass. The 
details about the code flow for SELECT statement compilation is as follows.

For a SELECT statement, we may have to go through the query compilation phase 
twice. Whether we go through one cycle or 2 cycles of compilation phase depends 
on wheter there are any tables involved in the query who have their statistics 
missing. During the first attempt of query compile, we will make a list of all 
the tables involved in the query during the bind phase of compile. After the 
bind phase, during the first pass through the SELECT query compilation, we will 
check if the query involved has any tables whose statistics are missing. If 
yes, then we will quit from the SELECT compile phase by throwing missing 
statistics exception. We will handle the missing stats exception by trying to 
update the missing statistics. After that, we will go through the second pass 
of original query compilation. During this second pass, we will not worry if 
the statistics are missing or not. We will just work with whatever statistics 
are available (one example where the statistics may still be missing after 
trying to create them is say a user who only has select privileges on a table 
finds the missing stats. But the statistics creation will fail because the user 
does not have sufficient privileges to create the statistics. There can be 
other reasons for statistics creation failure too. Another instance would be 
that we can't get the necessary locks to update the stats. In such a case, we 
will just quit out of updating the stats and move on to the second pass of the 
original query compilation. 

One of the issues with this patch I need to work on and will appreicate if 
anyone has any feedback on. The update statistics is happening in the user 
transaction. What that means is that any locks required to update statistics 
will stay in place until the user transaction has been committed/rolled back. 
This behavior can be an issue with user applications (since these are the locks 
the user was not expecting to get as part of a SELECT query execution). This 
behavior definitely causes problems with our tests where quite a handful of 
tests run into locking issues because of the additional locks acquired before 
of a SELECT query. I think one way to fix this would be to somehow run the 
statistics in a nested transaction which can be committed after the statistics 
are created. If we run into locking issues in the nested transaction then go 
ahead and run the statistic in the user transaction. This is what we do for 
GENERATED columns in InsertResultSet.getSetAutoincrementValue. One of the 
problems that I need to address before I can use nested transaction is to 
change the code that is written to fire statistics during SELECT compilation. 
Currently, in my patch, I am executing the stored procedure 
SYSCS_UTIL.SYSCS_UPDATE_STATISTICS which internally executes ALTER TABLE.. to 
update the statistics. I am at too high a level to be able to use nested 
transaction for updating the statistics (let me know if I am wrong here. May be 
there is a way to use nested transaction while firing the 
SYSCS_UTIL.SYSCS_UPDATE_STATISTICS that I am unaware of). I think I need to get 
to update statistics code in AlterTableConstantAction directly somehow without 
going through the stored procedure->ALTER TABLE.. sql path. I think if I can 
directly call the update statistics in AlterTableConstantAction, then I can try 
using the nested yransaction and if that does not succeed then user the user 
transaction.

I will appreciate any input you might have on my overall approach to this jira 
entry and then how to address the locking issue.

> Provide a zero-admin way of updating the statisitcs of an index
> ---------------------------------------------------------------
>
>                 Key: DERBY-3788
>                 URL: https://issues.apache.org/jira/browse/DERBY-3788
>             Project: Derby
>          Issue Type: New Feature
>          Components: Performance
>    Affects Versions: 10.5.0.0
>            Reporter: Mamta A. Satoor
>            Assignee: Mamta A. Satoor
>         Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, 
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, 
> DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt, 
> DERBY3788_update_all_missing_stats_after_bind_patch4_diff.txt, 
> DERBY3788_update_all_missing_stats_after_bind_patch4_stat.txt, 
> DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *********************
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *********************
> *********************
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 100000 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *********************

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to