Github user nonstop-qfchen commented on a diff in the pull request:
https://github.com/apache/incubator-trafodion/pull/195#discussion_r46225727
--- Diff: core/sql/sqlcomp/nadefaults.cpp ---
@@ -2180,6 +2180,10 @@ SDDkwd__(ISO_MAPPING, (char
*)SQLCHARSETSTRING_ISO88591),
XDDkwd__(MDAM_SCAN_METHOD, "ON"),
DDflt0_(MDAM_SELECTION_DEFAULT, "0.5"),
+
+ DDflt0_(MDAM_TOTAL_UEC_CHECK_MIN_RC_THRESHOLD, "10000"),
+ DDflt0_(MDAM_TOTAL_UEC_CHECK_UEC_THRESHOLD, "0.01"),
--- End diff --
Hi Hans and Dave,
Thanks a lot for the review comments.
There is a piece of detail that I need to describe.
The method checkMDAMadditionalRestriction() takes the argument
lastColumnPosition which is 1 plus the index of the stopping column to
consider the heuristic. In the call, the actual argument is 6, or to the
column BRUKRS that has the predicate defined.
For the two cases for discussion, the index of the stopping columns will be
3 for case 1 and to 2 for case 2. In other words, the level of protection
would be different.
On the use of MC, I believe MC stats will further reduce the total UEC.
However, currently, the MCs are generated by default for any prefixes of
the key columns. So if we want to integrate the MC stats, the RC count can
be reduced only by the MC on the longest prefix of key columns that are
free of predicate. I think this is a good suggestion and will try to see if
it can be arranged.
Third, I am open to suggestion to fine tune the threshold (if needed, say
to 10%?).
Lastly, on whether this heuristic should be put into the place. My view
here is that the costing is never perfect for MDAM and therefor extra
orthogonal protections are always good. In this special customer case, the
MDAM is rejected even if the threshold CQD is raised to 100%. This
indicates that the heuristic works.
Thanks --Qifan
key_columns ............ _SALT_, RYEAR, DOCLN, DOCNR, RLDNR, RBUKRS, RCLNT
executor_predicates .... (RYEAR = '2014') and (RBUKRS = 'A019')
begin_key .............. (_SALT_ = (\:_sys_HostVarLoHashPart Hash2Distrib
32)), (RYEAR = '2014'), (DOCLN = '<min>'),
(DOCNR = '<min>'), (RLDNR = '<min>'),
(RBUKRS = 'A019'), (RCLNT = '<min>')
end_key ................ (_SALT_ = (\:_sys_HostVarHiHashPart Hash2Distrib
32)), (RYEAR = '2014'), (DOCLN = '<max>'),
(DOCNR = '<max>'), (RLDNR = '<max>'),
(RBUKRS = 'A019'), (RCLNT = '<max>')
#0 checkMDAMadditionalRestriction (keyPredsByCol=...,
lastColumnPosition=@0x7ffffffeb8ec,
hist=..., strategy=TOTAL_UECS, noOfmissingKeyColumns=@0x7ffffffeb8e4,
presentKeyColumns=@0x7ffffffeb8e0) at ../optimizer/ScanOptimizer.cpp:329
329 KeyColumns::KeyColumn::KeyColumnType typeOfRange =
KeyColumns::KeyColumn::EMPTY;
(gdb) p lastColumnPosition
$1 = (const CollIndex &) @0x7ffffffeb8ec: 6
On Mon, Nov 30, 2015 at 5:56 PM, DaveBirdsall <[email protected]>
wrote:
> In core/sql/sqlcomp/nadefaults.cpp
>
<https://github.com/apache/incubator-trafodion/pull/195#discussion_r46222783>
> :
>
> > @@ -2180,6 +2180,10 @@ SDDkwd__(ISO_MAPPING, (char
*)SQLCHARSETSTRING_ISO88591),
> > XDDkwd__(MDAM_SCAN_METHOD, "ON"),
> >
> > DDflt0_(MDAM_SELECTION_DEFAULT, "0.5"),
> > +
> > + DDflt0_(MDAM_TOTAL_UEC_CHECK_MIN_RC_THRESHOLD, "10000"),
> > + DDflt0_(MDAM_TOTAL_UEC_CHECK_UEC_THRESHOLD, "0.01"),
>
> Hi Hans,
>
> Actually, your Case 2 is better suited for MDAM than Case 1. In Case 2,
> we'll materialize each of the values of A (99 of them) and do a begin/end
> subset from B = 1 to B = 90. Just 99 subsets. In Case 1, we'll materialize
> each of the values of A (done 99 times), and for each value of A,
> materialize the values of B (9801 times, or less if there are functional
> dependencies which MC stats would show), and then for each of these, do a
> begin/end subset on C. Each materialization is potentially a random I/O,
> not to mention some significant path length (compared to the path length
of
> just traversing to the next row sequentially). But more to your point, the
> costing code takes into account these differences.
>
> â
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/incubator-trafodion/pull/195/files#r46222783>.
>
--
Regards, --Qifan
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---