Github user zellerh commented on a diff in the pull request:
https://github.com/apache/incubator-trafodion/pull/195#discussion_r46211677
--- Diff: core/sql/sqlcomp/nadefaults.cpp ---
@@ -2180,6 +2180,10 @@ SDDkwd__(ISO_MAPPING, (char
*)SQLCHARSETSTRING_ISO88591),
XDDkwd__(MDAM_SCAN_METHOD, "ON"),
DDflt0_(MDAM_SELECTION_DEFAULT, "0.5"),
+
+ DDflt0_(MDAM_TOTAL_UEC_CHECK_MIN_RC_THRESHOLD, "10000"),
+ DDflt0_(MDAM_TOTAL_UEC_CHECK_UEC_THRESHOLD, "0.01"),
--- End diff --
IMHO the 1% threshold used here is too low. We are saying that if MDAM can
eliminate "only" 98% of the values then it isn't even worth thinking about the
possibility of using MDAM? Or am I misunderstanding how the heuristic works?
The other problem with this heuristic is of course that it doesn't care
about which predicates we have. Let's say we have 3 leading key columns, a,b,c,
all with integer values 0...98 (UEC=99). Let's assume the total row count is
100,000.
Case 1:
- Predicate is WHERE c = 55
- UEC of columns w/o pred (a, b) is 99*99 = 9801 (no MC stats considered!)
- Threshold: 1,000 (1 % of 100,000)
- MDAM is not considered
Case 2:
- Predicate is WHERE b between 1 and 90
- UEC of columns w/o pred (a) is 99
- Threshold is again 1,000
- MDAM is considered, even though this query is not obviously better
suited for MDAM than case 1.
Do we really need these heuristics, or could we just let costing decide? If
we do, should we a) consider MC stats for the predicate-less columns and b)
consider increasing MDAM_TOTAL_UEC_CHECK_UEC_THRESHOLD to a higher value?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---