[
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892343#action_12892343
]
Mamta A. Satoor commented on DERBY-3788:
----------------------------------------
Just wanted to summarize what has been tried for this jira so far and a
proposal for a possible solution worth trying
1)First I tried to see if the missing stats could be created inline when the
SELECT query compilation finds that missing but that failed because SELECT
query compilation marked the data dictionary in read only mode where as update
statistics needs to update the data dictionary(this marking of read only for
data dictionary happens in GenericStatement.prepMinion). Because of this
read-only marking, we can't create the statistics inline of the compilation
phase of the SELECT query.
2)Second approach was during the SELECT compile phase, if we come across
missing stats, then schedule a task for later to update the stats. Finish the
SELECT compilation and execution after the update stat task is scheduled on the
queue (the scheduling mechanism used is only available with jdk 1.5 and
higher). When the scheduled task was later fired, it would run no current
connection exception. The reason for that was that there was no context set for
update statistics to run. Tried resolving this by creating a connection context
through following call InternalDriver.activeDriver().connect(url, info) I had
the SELECT compilation pass the needed information to the background task so I
can create the connection. This is pretty crude way of getting a connection to
update the statistics. Along with the fact that the code to create the
connection context was not very modular, some of the existing tests failed with
this changes because of locking issues. This probably is because the locks held
by the background thread for update stats interfered with locks required by the
rest of the test.
3)The third approach I tried was when the SELECT query detects missing stats,
abort the query compilation, (thus removing the data dictionary from read-only
mode), create the stats and then restart the original SELECT query compilation
and execution. The hope with this approach was that it will avoid running into
locking issues. If we decide to pursue this approach more, we need to make sure
that we look at the possibility of stats still missing when the original query
is compile the 2nd time. This can happen say because the user does not have
privileges to update the stats, the update stat ran into locking issues. If we
won't account for possibility of missing stats during the 2nd time through the
original query, we can end up in an infinite loop. Another problem with this
approach was that update stats were happening in the same transaction as the
original SELECT query. Which means that the locks acquired by the update stat
will not be released until the transaction is committed. Came across locking
issues with this when I ran the existing junit and derbyall tests. One way to
fix this could be to start a nested user transaction and do the update stat in
that transaction and commit that transaction(I think something like this is
done for identity columns. If yes, then we might find pointers there to use for
update stats in nested user transaction). Thus any locks acquired by the update
stat work will be released. And after that, go back to the original query again
for compilation and execution.
4)With any approach we take, we should detect the case where the table is empty
and hence the stats might be missing. For such a case, we should not try to
fire stats creation task. to rephrase, during the compile phase of the SELECT
query, when we look for statistics, first check if the table is empty and if
yes then skip the code for collecting stats. From what I recall, one can check
if the table is empty by issuing open scan controller and asking it for the
number of rows in the table. If we do not check for empty table, then we will
end up in infinite loop.
5)For read-only db we should detect that we are readonly and don't try to do
update statistics
6)Also, we probably want to create statistics only for user tables? Haven't
thought enough about it.
7)What if the user executing the query is not the owner of the table. should
statistics still be created? I think if the user does not have enough
privileges, then skip the step of creating stats since it's going to fail
anyways.
8)Once statistics are created, notify tabledescriptor about it because
tabledescriptor cahces available statistics and it will never know of the new
statistics since tabledescriptor holds on to that cache.
9)Once we have update stats working, another improvement could be to identify
the cached compiled queries that may benefit from updated stats and have those
queries recompile when they are executed next time. In other words, the
relevant queries should be invalidated as part of the statistics collection
task.
10)Look at other scenarios(in addition to step 7) above) where we know that
update stats might fail. For those cases, don't spend time trying to create the
stats. Instead just let the optimizer work with the information it has at the
moment to come up with the query plan.
> Provide a zero-admin way of updating the statisitcs of an index
> ---------------------------------------------------------------
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
> Issue Type: Improvement
> Components: Store
> Affects Versions: 10.5.1.1
> Reporter: Mamta A. Satoor
> Assignee: Mamta A. Satoor
> Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt,
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt,
> DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt,
> DERBY3788_update_all_missing_stats_after_bind_patch4_diff.txt,
> DERBY3788_update_all_missing_stats_after_bind_patch4_stat.txt,
> DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good
> for Derby to provide an automatic way of updating the statistics without
> requiring to run the stored procedure manually. There was some discussion on
> DERBY-269 about providing the 0-admin way. I have copied it here for
> reference.
> *********************
> Kathey Marsden - 22/May/05 03:53 PM
> Some sort of zero admin solution for updating statistics would be prefferable
> to the manual 'update statistics'
> *********************
> *********************
> Mike Matrigali - 11/Jun/08 12:37 PM
> I have not seen any other suggestions, how about the following zero admin
> solution? It is not perfect - suggestions welcome.
> Along with the statistics storing, save how many rows were in the table when
> exact statistics were calculated. This number is 0 if none have been
> calculated because index creation happened on an empty table. At query
> compile time when we look up statistics we automatically recalculate the
> statistics at certain threshholds - say something like row count growing past
> next threshhold : 10, 100, 1000, 100000 - with upper limit being somewhere
> around how many rows we can process in some small amount of time - like 1
> second on a modern laptop. If we are worried about response time, maybe we
> background queue the stat gathering rather than waiting with maybe some quick
> load if no stat has ever been gathered. The background gathering could be
> optimized to not interfere with locks by using read uncommitted.
> I think it would be useful to also have the manual call just to make it easy
> to support customers and debug issues in the field. There is proably always
> some dynamic data distribution change that in some case won't be picked up by
> the automatic algorithm. Also just very useful for those who have complete
> control of the create ddl, load data, run stats, deliver application process.
> *********************
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.