GitHub user DaveBirdsall opened a pull request:
https://github.com/apache/trafodion/pull/1443
[TRAFODION-2645] Fixes to MDAM costing code found during inner join tests
This set of changes fixes some issues that were found in the new MDAM
costing code while testing it for the case of an inner table of a nested join.
The fixes are:
1. We were way too pessimistic in calculating the number of seeks. We
assumed we would do a seek per block, but it is obvious that the number of
seeks is bounded from above by the number of subsets. This showed up when
testing queries with tiny numbers of probes (4) but large numbers of rows per
sequential subset (over 100 blocks worth in total).
2. We were applying some rowsize adjustment factors twice, once in
NewMDAMCostWA::compute, and again in
FileScanOptimizer::scmComputeMDAMCostForHbase, which it calls. The duplicate
logic was removed in NewMDAMCostWA::compute.
3. We were overestimating the number of fetch subset rows when costing MDAM
for the inner table of a nested join. It turns out that the histogram handling
does a cross product scale-up of the histograms in order to calculate the
effect of a join predicate. But this scale-up looks at things from the
standpoint of the complete result of the query, while the MDAM costing logic is
looking at things from the standpoint of a single outer table join probe.
Dividing fetch row counts by the number of outer table join probes corrects the
problem. Then at a higher level we multiply by the number of outer table join
probes (covering both MDAM probes and MDAM fetches) to calculate the total cost
of the MDAM scan.
Notes: The new MDAM costing code is still off by default. To turn it on,
use CQD MDAM_COSTING_REWRITE 'ON'. To test MDAM on the inner table of nested
joins, one must either force such plans with control query shape or turn off
heuristics that prevent MDAM in these cases. There are two of main interest:
CQD MDAM_UNDER_NJ_PROBES_THRESHOLD turns off MDAM when the number of outer
table probes is higher than its value. It defaults to '0' so by default MDAM is
(almost) always turned off on the inner table of nested joins. Setting this to
a positive value (I used '1000' for my testing) allows MDAM. Secondly, there is
a heuristic in the JoinToTSJ transformation rule that prevents nested joins if
no inner table key prefix is covered by some predicate. This prevents
consideration of nested joins where MDAM may help when a predicate is missing
on the leading column. Setting CQD KEYLESS_NESTED_JOINS 'ON' overrides this
heuristic. Alternatively, turning off hash joins and merge joins also ove
rrides this heuristic.
Summarizing, if one wishes to test the new MDAM costing code, and wishes to
benefit from nested joins using MDAM on the inner table, then one should
presently set these CQDs:
```
CQD MDAM_UNDER_NJ_PROBES_THRESHOLD '1000'; -- (some positive value greater
than one)
CQD KEYLESS_NESTED_JOINS 'ON';
CQD MDAM_COSTING_REWRITE 'ON';
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/DaveBirdsall/trafodion MdamRewriteNJ
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/trafodion/pull/1443.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1443
----
commit c008ebe519f51e5344b650a9c88e66e973320084
Author: Dave Birdsall <dbirdsall@...>
Date: 2018-02-14T00:42:41Z
[TRAFODION-2645] Fixes to MDAM costing code found during inner join testing
----
---