Hello All,
I work for TokuTek as a Storage Engine developer and I have been using
the, relatively, new optimizer trace functionality. I have found the
results very helpful when combined with analyzing the source code, but
there a few questions I want to ask. Here is some output from the
trace for a recent SELECT query on an InnoDB table:
At the end of the trace, it shows the index chosen by, what I presume,
is the optimizer:
...
"considered_execution_plans":
[
{
"plan_prefix": [],
"table": "`reports` `a`",
"best_access_path":
{
"considered_access_paths": [
{
"access_type": "ref",
"index": "reports_group_id",
"rows": 1.55e6,
"cost": 411057,
"chosen": true
},
...
]
},
"cost_for_plan": 411057,
"rows_for_plan": 1.55e6,
"chosen": true
}
]
},
...
Here are the cost results for some rejected alternative indexes,
(note: the row count for two of them is similar to the chosen index):
...
"analyzing_range_alternatives": {
"range_scan_alternatives"
...
"rows": 2377102,
"cost": 2.81e6,
...
"rows": 1554164,
"cost": 1.84e6,
...
"rows": 2346664,
"cost": 2.78e6,
...
"rows": 1554164,
"cost": 1.84e6,
...
Here is the cost analysis for a table scan, from the beginning of the
trace (still not as good as the above chosen index, but close on
cost):
...
"table": "`reports` `a`",
"range_analysis":
{
"table_scan":
{
"rows": 3108329,
"cost": 655076
},
...
Related to the fact that the table scan analysis is dominated by the
large row count, and probably rejected as a valid plan compared to the
chosen index for that very reason:
1. How is the row count more than merely informational to the
optimizer? I had the impression the row count did not directly
contribute to the optimizer's decision as much as the estimated cost,
or that the row count was already accounted for/factored into in the
final cost measurement.
2. Related to that question, what is cost exactly? Is there a unit
of measurement (ex: Time, Disk Seeks, etc.)?
3. Is cost more important than row count or equally important?
4. What is the genesis of the cost variable displayed in the trace?
I assume it is just a guess on the respective storage engine's part of
how expensive (in terms of disk access, processing, time, etc.)
getting the rows will be for the query.
thanks in advance,
Christian
_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to : [email protected]
Unsubscribe : https://launchpad.net/~maria-developers
More help : https://help.launchpad.net/ListHelp