something weird about MDAM compiler costing: isMultipleProbe() returning false...

Eric Owhadi Tue, 15 Mar 2016 15:35:19 -0700

In an attempt to implement small scanner optimization for MDAM scan, I
investigated the compiler costing of MDAM plan, with the hope of finding
out there, how many scan splits and probes would a particular MDAM plan
result in.


The immediate result would have been that I could easily guestimate the
size of each scan and determine if it is below the Hbase Block size and
therefore turn on or off small scanner accordingly. I have measured a 1.39X
boost in performance in MDAM scan for situation where scans are good fit
for small scanner… so I am motivated to get this feature in J



So I wrote a simple test:



create table t132helper (a int not null, primary key(a));

insert into t132helper values(1);

create table t132 (k1 int not null, k2 int not null, a int not null, b int
not null,

     c
char(1000),
primary key (k1,k2)) ATTRIBUTES ALIGNED FORMAT ;

upsert using load

into t132

  select  x1000*1000+ x10000*10000 + x100000*100000 ,

          x1+x10*10+x100*100,

          x1+x10*10+x100*100+ x1000*1000+ x10000*10000 + x100000*100000 ,

          x1+x10*10+x100*100+ x1000*1000+ x10000*10000 + x100000*100000 ,

          'yo bro'



  from t132helper

transpose 0,1,2,3,4,5,6,7,8,9 as x1

transpose 0,1,2,3,4,5,6,7,8,9 as x10

transpose 0,1,2,3,4,5,6,7,8,9 as x100

transpose 0,1,2,3,4,5,6,7,8,9 as x1000

transpose 0,1,2,3,4,5,6,7,8,9 as x10000

transpose 0,1,2,3,4,5,6,7,8,9 as x100000;

update statistics for table t132 on every column;





then

explain select sum(a) from t132 where k2 between 0 and 100 and k1>500000;



this correctly result in an MDAM plan but:



------------------------------------------------------------------ PLAN
SUMMARY

MODULE_NAME .............. DYNAMICALLY COMPILED

STATEMENT_NAME ........... XX

PLAN_ID .................. 212324835567955274

ROWS_OUT ................. 1

EST_TOTAL_COST ........... 1.47

STATEMENT ................ select sum(a)

                           from t132

                           where k2 between 0 and 100 and k1>500000;





------------------------------------------------------------------ NODE
LISTING

ROOT ======================================  SEQ_NO 3        ONLY CHILD 2

REQUESTS_IN .............. 1

ROWS_OUT ................. 1

EST_OPER_COST ............ 0

EST_TOTAL_COST ........... 1.47

DESCRIPTION

  max_card_est ........... 1

  fragment_id ............ 0

  parent_frag ............ (none)

  fragment_type .......... master

  statement_index ........ 0

  affinity_value ......... 0

  max_max_cardinalit  50,399

  total_overflow_size .... 0.00 KB

  xn_access_mode ......... read_only

  xn_autoabort_interval    0

  auto_query_retry ....... enabled

  plan_version ....... 2,600

  embedded_arkcmp ........ used

  IS_SQLCI ............... ON

  LDAP_USERNAME

  GENERATE_EXPLAIN ....... ON

  ObjectUIDs ............. 3134597362287934668

  select_list ............ sum(TRAFODION.SCH.T132.A)





SORT_SCALAR_AGGR ==========================  SEQ_NO 2        ONLY CHILD 1

REQUESTS_IN .............. 1

ROWS_OUT ................. 1

EST_OPER_COST ............ 0.01

EST_TOTAL_COST ........... 1.47

DESCRIPTION

  max_card_est ........... 1

  fragment_id ............ 0

  parent_frag ............ (none)

  fragment_type .......... master

  aggregates ............. sum(TRAFODION.SCH.T132.A)





TRAFODION_SCAN ============================  SEQ_NO 1        NO CHILDREN

TABLE_NAME ............... T132

REQUESTS_IN .............. 1

ROWS_OUT ............ 50,399

EST_OPER_COST ............ 1.47

EST_TOTAL_COST ........... 1.47

DESCRIPTION

  max_card_est ..... 101,000

  fragment_id ............ 0

  parent_frag ............ (none)

  fragment_type .......... master

  scan_type .............. subset scan limited by mdam of table

                             TRAFODION.SCH.T132

  object_type ............ Trafodion

  cache_size ........ 10,000

  probes ................. 1

  rows_accessed ..... 50,399

  column_retrieved ....... #1:1

  key_columns ............ K1, K2

  mdam_disjunct .......... (K1 > 500000) and (K2 >= 0) and (K2 <= 100)

  part_key_predicates .... (K1 > 500000)



--- SQL operation complete.

>>





See how probes…..1 in the explain?

This is not correct, this MDAM plan will result in at least 500 probes, and
500 scans of each 101 row.

Following its execution with debugger shows indeed that it is doing the
expected probing/scanning.





I was hoping that this probe….1 at explain was just a “cosmetic” bug, and
that I would find in the compiler code at one point an accurate tentative
to measure how many probes/slice of scan an mdam would result into…



But no, navigating inside the compiler code, look like a whole set of code
dealing with evaluating probe count and scan count get bypassed because
this function get evaluated to false:





NABoolean FileScanOptimizer::isMultipleProbes() const

{

  CostScalar repeatCount = getContext().getPlan()->getPhysicalProperty()->

    getDP2CostThatDependsOnSPP()->getRepeatCountForOperatorsInDP2() ;

  CollIndex numCols =
getContext().getInputLogProp()->getColStats().entries();



  return (repeatCount.isGreaterThanOne()) OR (numCols > 0);

}



Not sure what is the logic inside trying to do, but I know that I would
love to step into the code that get bypassed because of this false J.

Anyone familiar with this?

Thanks in advance for the help,
Eric

something weird about MDAM compiler costing: isMultipleProbe() returning false...

Reply via email to