Github user nonstop-qfchen commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/195#discussion_r46225727
  
    --- Diff: core/sql/sqlcomp/nadefaults.cpp ---
    @@ -2180,6 +2180,10 @@ SDDkwd__(ISO_MAPPING,           (char 
*)SQLCHARSETSTRING_ISO88591),
       XDDkwd__(MDAM_SCAN_METHOD,                       "ON"),
     
       DDflt0_(MDAM_SELECTION_DEFAULT,          "0.5"),
    +
    +  DDflt0_(MDAM_TOTAL_UEC_CHECK_MIN_RC_THRESHOLD, "10000"),
    +  DDflt0_(MDAM_TOTAL_UEC_CHECK_UEC_THRESHOLD,       "0.01"),
    --- End diff --
    
    Hi Hans and Dave,
    
    Thanks a lot for the review comments.
    
    There is a piece of detail that I need to describe.
    
    The method checkMDAMadditionalRestriction() takes the argument
    lastColumnPosition which is 1 plus the index of the stopping column to
    consider the heuristic.  In the call, the actual argument is 6, or to the
    column BRUKRS that has the predicate defined.
    
    For the two cases for discussion, the index of the stopping columns will be
    3 for case 1 and to 2 for case 2.  In other words, the level of protection
    would be different.
    
    On the use of MC, I believe MC stats will further reduce the total UEC.
    However, currently, the MCs are generated by default for any prefixes of
    the key columns. So if we want to integrate the MC stats, the RC count can
    be reduced only by the MC on the longest prefix of key columns that are
    free of predicate. I think this is a good suggestion and will try to see if
    it can be arranged.
    
    Third, I am open to suggestion to fine tune the threshold (if needed, say
    to 10%?).
    
    Lastly,  on whether this heuristic should be put into the place.  My view
    here is that the costing is never perfect for MDAM and therefor extra
    orthogonal protections are always good. In this special customer case, the
    MDAM is rejected even if the threshold CQD is raised to 100%. This
    indicates that the heuristic works.
    
    Thanks --Qifan
    
     key_columns ............ _SALT_, RYEAR, DOCLN, DOCNR, RLDNR, RBUKRS, RCLNT
      executor_predicates .... (RYEAR = '2014') and (RBUKRS = 'A019')
      begin_key .............. (_SALT_ = (\:_sys_HostVarLoHashPart Hash2Distrib
                                 32)), (RYEAR = '2014'), (DOCLN = '<min>'),
                                 (DOCNR = '<min>'), (RLDNR = '<min>'),
                                 (RBUKRS = 'A019'), (RCLNT = '<min>')
      end_key ................ (_SALT_ = (\:_sys_HostVarHiHashPart Hash2Distrib
                                 32)), (RYEAR = '2014'), (DOCLN = '<max>'),
                                 (DOCNR = '<max>'), (RLDNR = '<max>'),
                                 (RBUKRS = 'A019'), (RCLNT = '<max>')
    
    
    
    #0  checkMDAMadditionalRestriction (keyPredsByCol=...,
    lastColumnPosition=@0x7ffffffeb8ec,
        hist=..., strategy=TOTAL_UECS, noOfmissingKeyColumns=@0x7ffffffeb8e4,
        presentKeyColumns=@0x7ffffffeb8e0) at ../optimizer/ScanOptimizer.cpp:329
    329   KeyColumns::KeyColumn::KeyColumnType typeOfRange =
    KeyColumns::KeyColumn::EMPTY;
    (gdb) p lastColumnPosition
    $1 = (const CollIndex &) @0x7ffffffeb8ec: 6
    
    On Mon, Nov 30, 2015 at 5:56 PM, DaveBirdsall <[email protected]>
    wrote:
    
    > In core/sql/sqlcomp/nadefaults.cpp
    > 
<https://github.com/apache/incubator-trafodion/pull/195#discussion_r46222783>
    > :
    >
    > > @@ -2180,6 +2180,10 @@ SDDkwd__(ISO_MAPPING,           (char 
*)SQLCHARSETSTRING_ISO88591),
    > >    XDDkwd__(MDAM_SCAN_METHOD,                   "ON"),
    > >
    > >    DDflt0_(MDAM_SELECTION_DEFAULT,              "0.5"),
    > > +
    > > +  DDflt0_(MDAM_TOTAL_UEC_CHECK_MIN_RC_THRESHOLD, "10000"),
    > > +  DDflt0_(MDAM_TOTAL_UEC_CHECK_UEC_THRESHOLD,   "0.01"),
    >
    > Hi Hans,
    >
    > Actually, your Case 2 is better suited for MDAM than Case 1. In Case 2,
    > we'll materialize each of the values of A (99 of them) and do a begin/end
    > subset from B = 1 to B = 90. Just 99 subsets. In Case 1, we'll materialize
    > each of the values of A (done 99 times), and for each value of A,
    > materialize the values of B (9801 times, or less if there are functional
    > dependencies which MC stats would show), and then for each of these, do a
    > begin/end subset on C. Each materialization is potentially a random I/O,
    > not to mention some significant path length (compared to the path length 
of
    > just traversing to the next row sequentially). But more to your point, the
    > costing code takes into account these differences.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/incubator-trafodion/pull/195/files#r46222783>.
    >
    
    
    
    -- 
    Regards, --Qifan



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to