[
https://issues.apache.org/jira/browse/DERBY-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kristian Waagan updated DERBY-4771:
-----------------------------------
Attachment: derby-4771-2g-prototype_lcc_code_dump.diff
Attached patch 2g.
Addressing Dag's first set of comments from 01/Dec/10 05:23 PM (in order).
M java/engine/org/apache/derby/impl/sql/compile/CursorNode.java
* I removed the TODO comment and created method collectTablePrivsAndStats.
Also reformatted the existing comment somewhat.
* Added comment. The BaseFromTable can represent several types of sources,
for instance system tables, sub queries, and VTIs. It can also represent a
view, but FromBaseTable nodes representing views are rewritten to
FromSubqueries during binding according to the FromBaseTable class JavaDoc.
* Now using no-arg constructor.
* Added comment and made for-loop iterate backwards to. It is very likely
that statistics are mostly up-to-date, so removing from the end of the list
saves some copies. On the other hand, the list will mostly be very small,
which is why I'm not creating a second list to copy relevant table
descriptors to.
Renamed method in TableDescriptor to getAndClearIndexStatsIsUpToDate.
* Yes, comment removed.
M java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java
* I decided to rewrite the code, which relocates the use of the constant.
It will now be handled inside TableDescriptor.markForIndexStatsUpdate.
See also comment below at the bottom.
M java/engine/org/apache/derby/impl/sql/catalog/DataDictionaryImpl.java
* I'm keeping this temporary code for now, but I have changed it.
If the user hasn't explicitly specified the logging property, it will be set
to true. If explicitly specified by the user, it won't be overridden.
This code should be removed before going into a release, though, and then I
guess logging will default to false.
A
java/engine/org/apache/derby/impl/services/daemon/IndexStatisticsDaemonImpl.java
M java/engine/org/apache/derby/impl/db/BasicDatabase.java
M java/engine/org/apache/derby/iapi/sql/dictionary/TableDescriptor.java
* TBD: I'll get back to this later, I have rewritten the code a bit (ditching
the use of negative row count estimates).
* I removed the TODO comment. I added it when I discovered that the row count
estimate can get "out of sync" due to how Derby is updating it. It will only
happen in certain circumstances, and based on the comments for DERBY-2949 it
looks like Knut hit the same problem. He also says that it might be possible
to improve the logic.
Added JavaDoc.
M java/engine/org/apache/derby/iapi/sql/dictionary/DataDictionary.java
* Various comments from Dag here:
o The daemon can disabled at runtime if it experiences severe errors. Note
that if the user doesn't want to have the daemon running, he/she would
disable it by setting a (system-wide or database) property.
o Renamed methods.
o Not sure what I'm supposed to do about this comment, but it is correct :)
* TBD: I'll address the comment about errors in the next iteration.
* Fixed typo.
I decided to add four properties to aid debugging and development. These
properties are (with current defaults:
a) derby.storage.indexStats.debug.createThreshold (100)
b) derby.storage.indexStats.debug.absdiffThreshold (1000)
c) derby.storage.indexStats.debug.lndiffThreshold (1.0)
d) derby.storage.indexStats.debug.queueSize (5)
(a) determines how big a table must be before statistics are automatically
created. (b) determines how big the discrepancy between the row estimates for
the table and the index must be before the statistics are updated. (c)
determines how big the logarithmic (natural logarithm) must be before the
statistics are updated. The values of these properties are printed if tracing
is turned on. Now:
Q: I don't understand these properties!
A: Read the code ;)
These properties are made available for experimentation and debugging
only. a-c affect when statistics are created or updated, and are used in
TableDescriptor. (d) is only used in IndexStatisticsDaemonImpl.
Q: Why have both (a) and (b)?
A: Purely for debugging and experimentation. If these properties are included
in production code, I expect they can be folded into one.
Q: Why have both (b) and (c)?
A: In general (c) will decide if the statistics are updated. However, for
small tables (c) will cause frequent updates of the statistics. For small
tables accurate statistics are not needed for good performance [1], so
there is no reason to frequently update the stats. This is where (b) comes
into play.
[1] One exception might be if the rows are huge.
Note that I have two outstanding comments from Dag (marked TBD), and ten TODOs
left. Four of these won't go away until later. The remaining six I'll try to
address in the next iteration.
> Continue investigation of automatic creation/update of index statistics
> -----------------------------------------------------------------------
>
> Key: DERBY-4771
> URL: https://issues.apache.org/jira/browse/DERBY-4771
> Project: Derby
> Issue Type: Task
> Components: SQL, Store
> Affects Versions: 10.8.0.0
> Reporter: Kristian Waagan
> Assignee: Kristian Waagan
> Attachments: autoindexstats.html,
> derby-4771-1a-prototype_code_dump.diff,
> derby-4771-1a-prototype_code_dump.stat,
> derby-4771-1b-prototype_code_dump.diff,
> derby-4771-2a-prototype_lcc_code_dump.diff,
> derby-4771-2b-prototype_lcc_code_dump.diff,
> derby-4771-2c-prototype_lcc_code_dump.diff,
> derby-4771-2d-prototype_lcc_code_dump.diff, DERBY-4771-2e-prototype.rar,
> derby-4771-2e-prototype_lcc_code_dump.diff,
> Derby-4771-2f-AutomaticIndexStatisticsTest_wondows7.rar,
> derby-4771-2f-prototype_lcc_code_dump-WORK-IN-PROGRESS.diff,
> derby-4771-2f-prototype_lcc_code_dump.diff,
> derby-4771-2g-prototype_lcc_code_dump.diff, derby.log, error-stacktrace.out,
> rjall.out, rjall.out, rjall.out, rjall.rar, rjone.out
>
>
> Work was started to improve Derby's handling of index statistics. This issue
> tracks further discussion and work for this task.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.