David Wayne Birdsall created TRAFODION-2552:
-----------------------------------------------

             Summary: Skew buster plan not chosen when join predicate involves 
SUBSTRs
                 Key: TRAFODION-2552
                 URL: https://issues.apache.org/jira/browse/TRAFODION-2552
             Project: Apache Trafodion
          Issue Type: Bug
          Components: sql-cmp
    Affects Versions: 2.1-incubating, 2.2-incubating
         Environment: All
            Reporter: David Wayne Birdsall
            Assignee: David Wayne Birdsall


Skew-buster is a feature of Trafodion that modifies the partitioning function 
on joins when skewed values are present. When we notice that there are a few 
very frequent values among many low frequency values in a join column, we use a 
different strategy for joining the frequent values than the non-frequent ones.

The following script reproduces the problem:

?section setup

drop table if exists skewSubstr;

create table skewSubstr (a int not null, b char(16), primary key (a))
  salt using 4 partitions;

upsert using load into skewSubstr
select 
  x1 + 10*x2 + 100*x3 + 1000*x4 + 10000*x5 + 100000*x6,
  case when mod(x1 + 10*x2 + 100*x3 + 1000*x4 + 10000*x5 + 100000*x6,97) = 0 
then 'askewvalue'
       else char(x5+97) || char(mod(x4 + 3*x2 + 7*x6,26)+97) || char(x4+97) || 
char(x1+97) || 
            char(mod(2*x1 + 5*x5 + x6,26)+97) || char(mod(x1+x2+x3,26)+97) || 
char(x6+97) ||
            char(x2+97) || char(x3+97) || char(mod(x1-x2+x4+2*x6,26)+97) 
       end
-- the from clause below creates 1,000,000 rows, the cross product of
-- 6 copies of { 0, ... 9 }
  from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) T(x1)
transpose 0,1,2,3,4,5,6,7,8,9 as x2
transpose 0,1,2,3,4,5,6,7,8,9 as x3
transpose 0,1,2,3,4,5,6,7,8,9 as x4
transpose 0,1,2,3,4,5,6,7,8,9 as x5
transpose 0,1,2,3,4,5,6,7,8,9 as x6;

update statistics for table skewSubtr on every column;

create table otherTable(c int not null, d char(16), primary key (c))
  salt using 4 partitions;

upsert using load into otherTable
  select a, case when b = 'askewvalue' then substr(b,1,10) || 
char(mod(a,26)+97) else b end
    from skewSubstr;

update statistics for table otherTable on every column;

?section doit

cqd SKEW_SENSITIVITY_THRESHOLD '0.0';
cqd SKEW_ROWCOUNT_THRESHOLD '10';

prepare goodQuery from select count(*) from skewSubstr join otherTable
  on b = d;

-- notice in the explain, that the descriptions on the esp_exchanges are
-- (h2-ud) and (h2-br), which indicates that skew-buster is operative
explain options 'f' goodQuery;

prepare badQuery from select count(*) from skewSubstr join otherTable
  on substr(b,1,9) = substr(d,1,9);

-- notice in the explain, that the descriptions on the esp_exchanges are
-- (hash2), which indicates that skew-buster is NOT operative
explain options 'f' badQuery;

To see the problem, look at the output of the EXPLAINs:

>>
>>cqd SKEW_SENSITIVITY_THRESHOLD '0.0';

--- SQL operation complete.
>>cqd SKEW_ROWCOUNT_THRESHOLD '10';

--- SQL operation complete.
>>
>>prepare goodQuery from select count(*) from skewSubstr join otherTable
+>  on b = d;

--- SQL command prepared.
>>
>>-- notice in the explain, that the descriptions on the esp_exchanges are
>>-- (h2-ud) and (h2-br), which indicates that skew-buster is operative
>>explain options 'f' goodQuery;

LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
---- ---- ---- --------------------  --------  --------------------  ---------

8    .    9    root                                                  1.00E+000
7    .    8    sort_partial_aggr_ro                                  1.00E+000
6    .    7    esp_exchange                    1:4(h2-ud)            1.00E+000
5    .    6    sort_partial_aggr_le                                  1.00E+000
4    2    5    hybrid_hash_join                                      1.01E+006
3    .    4    esp_exchange                    4(h2-ud):4(hash2)     1.00E+006
.    .    3    trafodion_scan                  SKEWSUBSTR            1.00E+006
1    .    2    esp_exchange                    4(h2-br):4(hash2)     1.00E+006
.    .    1    trafodion_scan                  OTHERTABLE            1.00E+006

--- SQL operation complete.
>>
>>prepare badQuery from select count(*) from skewSubstr join otherTable
+>  on substr(b,1,9) = substr(d,1,9);

--- SQL command prepared.
>>
>>-- notice in the explain, that the descriptions on the esp_exchanges are
>>-- (hash2), which indicates that skew-buster is NOT operative
>>explain options 'f' badQuery;

LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
---- ---- ---- --------------------  --------  --------------------  ---------

8    .    9    root                                                  1.00E+000
7    .    8    sort_partial_aggr_ro                                  1.00E+000
6    .    7    esp_exchange                    1:4(hash2)            1.00E+000
5    .    6    sort_partial_aggr_le                                  1.00E+000
4    2    5    hybrid_hash_join                                      1.79E+006
3    .    4    esp_exchange                    4(hash2):4(hash2)     1.00E+006
.    .    3    trafodion_scan                  SKEWSUBSTR            1.00E+006
1    .    2    esp_exchange                    4(hash2):4(hash2)     1.00E+006
.    .    1    trafodion_scan                  OTHERTABLE            1.00E+006

--- SQL operation complete.

Notice that when the join predicate is of the form SUBSTR(b,1,9) = 
SUBSTR(d,1,9), vanilla hash repartitioning is chosen even though the values of 
column b show extreme skew. The problem is that with the SUBSTRs present, we 
use default distribution properties for the join predicate instead of 
leveraging our knowledge about the skew in column B.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to