David Wayne Birdsall created TRAFODION-2552:
-----------------------------------------------
Summary: Skew buster plan not chosen when join predicate involves
SUBSTRs
Key: TRAFODION-2552
URL: https://issues.apache.org/jira/browse/TRAFODION-2552
Project: Apache Trafodion
Issue Type: Bug
Components: sql-cmp
Affects Versions: 2.1-incubating, 2.2-incubating
Environment: All
Reporter: David Wayne Birdsall
Assignee: David Wayne Birdsall
Skew-buster is a feature of Trafodion that modifies the partitioning function
on joins when skewed values are present. When we notice that there are a few
very frequent values among many low frequency values in a join column, we use a
different strategy for joining the frequent values than the non-frequent ones.
The following script reproduces the problem:
?section setup
drop table if exists skewSubstr;
create table skewSubstr (a int not null, b char(16), primary key (a))
salt using 4 partitions;
upsert using load into skewSubstr
select
x1 + 10*x2 + 100*x3 + 1000*x4 + 10000*x5 + 100000*x6,
case when mod(x1 + 10*x2 + 100*x3 + 1000*x4 + 10000*x5 + 100000*x6,97) = 0
then 'askewvalue'
else char(x5+97) || char(mod(x4 + 3*x2 + 7*x6,26)+97) || char(x4+97) ||
char(x1+97) ||
char(mod(2*x1 + 5*x5 + x6,26)+97) || char(mod(x1+x2+x3,26)+97) ||
char(x6+97) ||
char(x2+97) || char(x3+97) || char(mod(x1-x2+x4+2*x6,26)+97)
end
-- the from clause below creates 1,000,000 rows, the cross product of
-- 6 copies of { 0, ... 9 }
from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) T(x1)
transpose 0,1,2,3,4,5,6,7,8,9 as x2
transpose 0,1,2,3,4,5,6,7,8,9 as x3
transpose 0,1,2,3,4,5,6,7,8,9 as x4
transpose 0,1,2,3,4,5,6,7,8,9 as x5
transpose 0,1,2,3,4,5,6,7,8,9 as x6;
update statistics for table skewSubtr on every column;
create table otherTable(c int not null, d char(16), primary key (c))
salt using 4 partitions;
upsert using load into otherTable
select a, case when b = 'askewvalue' then substr(b,1,10) ||
char(mod(a,26)+97) else b end
from skewSubstr;
update statistics for table otherTable on every column;
?section doit
cqd SKEW_SENSITIVITY_THRESHOLD '0.0';
cqd SKEW_ROWCOUNT_THRESHOLD '10';
prepare goodQuery from select count(*) from skewSubstr join otherTable
on b = d;
-- notice in the explain, that the descriptions on the esp_exchanges are
-- (h2-ud) and (h2-br), which indicates that skew-buster is operative
explain options 'f' goodQuery;
prepare badQuery from select count(*) from skewSubstr join otherTable
on substr(b,1,9) = substr(d,1,9);
-- notice in the explain, that the descriptions on the esp_exchanges are
-- (hash2), which indicates that skew-buster is NOT operative
explain options 'f' badQuery;
To see the problem, look at the output of the EXPLAINs:
>>
>>cqd SKEW_SENSITIVITY_THRESHOLD '0.0';
--- SQL operation complete.
>>cqd SKEW_ROWCOUNT_THRESHOLD '10';
--- SQL operation complete.
>>
>>prepare goodQuery from select count(*) from skewSubstr join otherTable
+> on b = d;
--- SQL command prepared.
>>
>>-- notice in the explain, that the descriptions on the esp_exchanges are
>>-- (h2-ud) and (h2-br), which indicates that skew-buster is operative
>>explain options 'f' goodQuery;
LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------
8 . 9 root 1.00E+000
7 . 8 sort_partial_aggr_ro 1.00E+000
6 . 7 esp_exchange 1:4(h2-ud) 1.00E+000
5 . 6 sort_partial_aggr_le 1.00E+000
4 2 5 hybrid_hash_join 1.01E+006
3 . 4 esp_exchange 4(h2-ud):4(hash2) 1.00E+006
. . 3 trafodion_scan SKEWSUBSTR 1.00E+006
1 . 2 esp_exchange 4(h2-br):4(hash2) 1.00E+006
. . 1 trafodion_scan OTHERTABLE 1.00E+006
--- SQL operation complete.
>>
>>prepare badQuery from select count(*) from skewSubstr join otherTable
+> on substr(b,1,9) = substr(d,1,9);
--- SQL command prepared.
>>
>>-- notice in the explain, that the descriptions on the esp_exchanges are
>>-- (hash2), which indicates that skew-buster is NOT operative
>>explain options 'f' badQuery;
LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------
8 . 9 root 1.00E+000
7 . 8 sort_partial_aggr_ro 1.00E+000
6 . 7 esp_exchange 1:4(hash2) 1.00E+000
5 . 6 sort_partial_aggr_le 1.00E+000
4 2 5 hybrid_hash_join 1.79E+006
3 . 4 esp_exchange 4(hash2):4(hash2) 1.00E+006
. . 3 trafodion_scan SKEWSUBSTR 1.00E+006
1 . 2 esp_exchange 4(hash2):4(hash2) 1.00E+006
. . 1 trafodion_scan OTHERTABLE 1.00E+006
--- SQL operation complete.
Notice that when the join predicate is of the form SUBSTR(b,1,9) =
SUBSTR(d,1,9), vanilla hash repartitioning is chosen even though the values of
column b show extreme skew. The problem is that with the SUBSTRs present, we
use default distribution properties for the join predicate instead of
leveraging our knowledge about the skew in column B.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)