[jira] Updated: (DERBY-4938) Implement istat scheduling/triggering

Kristian Waagan (JIRA) Wed, 09 Feb 2011 14:57:24 -0800

     [ 
https://issues.apache.org/jira/browse/DERBY-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kristian Waagan updated DERBY-4938:
-----------------------------------

    Attachment: derby-4938-1a-istat_scheduling.diff
                derby-4938-1a-istat_scheduling.stat

Attaching patch 1a, which adds the initial scheduling logic.

Updates or creation of the index cardinality statistics will only happen for 
prepared statements, and only when the query involves an access path using an 
index. In addition there are threshold that has to be reached/exceeded before 
an update is scheduled. These thresholds may have to be tweaked after a period 
of testing.

Note that DERBY-4939 has to be committed before the autostats are enabled, but 
here's some comments from DERBY-4771 about the available debug knobs for this 
feature:

-----
 a) derby.storage.indexStats.debug.createThreshold (100)
 b) derby.storage.indexStats.debug.absdiffThreshold (1000)
 c) derby.storage.indexStats.debug.lndiffThreshold (1.0)
 d) derby.storage.indexStats.debug.queueSize (5)

(a) determines how big a table must be before statistics are automatically
created. (b) determines how big the discrepancy between the row estimates for
the table and the index must be before the statistics are updated. (c)
determines how big the logarithmic (natural logarithm) must be before the
statistics are updated. The values of these properties are printed if tracing
is turned on. Now:

  Q: I don't understand these properties!
  A: Read the code ;)
     These properties are made available for experimentation and debugging
     only. a-c affect when statistics are created or updated, and are used in
     TableDescriptor. (d) is only used in IndexStatisticsDaemonImpl.

  Q: Why have both (a) and (b)?
  A: Purely for debugging and experimentation. If these properties are included
     in production code, I expect they can be folded into one.

  Q: Why have both (b) and (c)?
  A: In general (c) will decide if the statistics are updated. However, for
     small tables (c) will cause frequent updates of the statistics. For small
     tables accurate statistics are not needed for good performance [1], so
     there is no reason to frequently update the stats. This is where (b) comes
     into play.

[1] One exception might be if the rows are huge.
-----

Committed to trunk with revision 1069160.

> Implement istat scheduling/triggering
> -------------------------------------
>
>                 Key: DERBY-4938
>                 URL: https://issues.apache.org/jira/browse/DERBY-4938
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 10.8.0.0
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>         Attachments: derby-4938-1a-istat_scheduling.diff, 
> derby-4938-1a-istat_scheduling.stat
>
>
> The istat daemon has to get its orders from somewhere (it is not operating 
> purely on its own), and this issue tracks the addition of code that will 
> schedule units of works with with the daemon. 
> The current approach is based on statement compilation, i.e. prepared 
> statements, triggering the addition of units of work.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (DERBY-4938) Implement istat scheduling/triggering

Reply via email to