[
https://issues.apache.org/jira/browse/TRAFODION-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Wayne Birdsall resolved TRAFODION-2552.
---------------------------------------------
Resolution: Fixed
Fix Version/s: 2.2-incubating
> Skew buster plan not chosen when join predicate involves SUBSTRs
> ----------------------------------------------------------------
>
> Key: TRAFODION-2552
> URL: https://issues.apache.org/jira/browse/TRAFODION-2552
> Project: Apache Trafodion
> Issue Type: Bug
> Components: sql-cmp
> Affects Versions: 2.1-incubating, 2.2-incubating
> Environment: All
> Reporter: David Wayne Birdsall
> Assignee: David Wayne Birdsall
> Fix For: 2.2-incubating
>
>
> Skew-buster is a feature of Trafodion that modifies the partitioning function
> on joins when skewed values are present. When we notice that there are a few
> very frequent values among many low frequency values in a join column, we use
> a different strategy for joining the frequent values than the non-frequent
> ones.
> The following script reproduces the problem:
> ?section setup
> drop table if exists skewSubstr;
> create table skewSubstr (a int not null, b char(16), primary key (a))
> salt using 4 partitions;
> upsert using load into skewSubstr
> select
> x1 + 10*x2 + 100*x3 + 1000*x4 + 10000*x5 + 100000*x6,
> case when mod(x1 + 10*x2 + 100*x3 + 1000*x4 + 10000*x5 + 100000*x6,97) = 0
> then 'askewvalue'
> else char(x5+97) || char(mod(x4 + 3*x2 + 7*x6,26)+97) || char(x4+97)
> || char(x1+97) ||
> char(mod(2*x1 + 5*x5 + x6,26)+97) || char(mod(x1+x2+x3,26)+97) ||
> char(x6+97) ||
> char(x2+97) || char(x3+97) || char(mod(x1-x2+x4+2*x6,26)+97)
> end
> -- the from clause below creates 1,000,000 rows, the cross product of
> -- 6 copies of { 0, ... 9 }
> from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) T(x1)
> transpose 0,1,2,3,4,5,6,7,8,9 as x2
> transpose 0,1,2,3,4,5,6,7,8,9 as x3
> transpose 0,1,2,3,4,5,6,7,8,9 as x4
> transpose 0,1,2,3,4,5,6,7,8,9 as x5
> transpose 0,1,2,3,4,5,6,7,8,9 as x6;
> update statistics for table skewSubtr on every column;
> create table otherTable(c int not null, d char(16), primary key (c))
> salt using 4 partitions;
> upsert using load into otherTable
> select a, case when b = 'askewvalue' then substr(b,1,10) ||
> char(mod(a,26)+97) else b end
> from skewSubstr;
> update statistics for table otherTable on every column;
> ?section doit
> cqd SKEW_SENSITIVITY_THRESHOLD '0.0';
> cqd SKEW_ROWCOUNT_THRESHOLD '10';
> prepare goodQuery from select count(*) from skewSubstr join otherTable
> on b = d;
> -- notice in the explain, that the descriptions on the esp_exchanges are
> -- (h2-ud) and (h2-br), which indicates that skew-buster is operative
> explain options 'f' goodQuery;
> prepare badQuery from select count(*) from skewSubstr join otherTable
> on substr(b,1,9) = substr(d,1,9);
> -- notice in the explain, that the descriptions on the esp_exchanges are
> -- (hash2), which indicates that skew-buster is NOT operative
> explain options 'f' badQuery;
> To see the problem, look at the output of the EXPLAINs:
> >>
> >>cqd SKEW_SENSITIVITY_THRESHOLD '0.0';
> --- SQL operation complete.
> >>cqd SKEW_ROWCOUNT_THRESHOLD '10';
> --- SQL operation complete.
> >>
> >>prepare goodQuery from select count(*) from skewSubstr join otherTable
> +> on b = d;
> --- SQL command prepared.
> >>
> >>-- notice in the explain, that the descriptions on the esp_exchanges are
> >>-- (h2-ud) and (h2-br), which indicates that skew-buster is operative
> >>explain options 'f' goodQuery;
> LC RC OP OPERATOR OPT DESCRIPTION CARD
> ---- ---- ---- -------------------- -------- -------------------- ---------
> 8 . 9 root 1.00E+000
> 7 . 8 sort_partial_aggr_ro 1.00E+000
> 6 . 7 esp_exchange 1:4(h2-ud) 1.00E+000
> 5 . 6 sort_partial_aggr_le 1.00E+000
> 4 2 5 hybrid_hash_join 1.01E+006
> 3 . 4 esp_exchange 4(h2-ud):4(hash2) 1.00E+006
> . . 3 trafodion_scan SKEWSUBSTR 1.00E+006
> 1 . 2 esp_exchange 4(h2-br):4(hash2) 1.00E+006
> . . 1 trafodion_scan OTHERTABLE 1.00E+006
> --- SQL operation complete.
> >>
> >>prepare badQuery from select count(*) from skewSubstr join otherTable
> +> on substr(b,1,9) = substr(d,1,9);
> --- SQL command prepared.
> >>
> >>-- notice in the explain, that the descriptions on the esp_exchanges are
> >>-- (hash2), which indicates that skew-buster is NOT operative
> >>explain options 'f' badQuery;
> LC RC OP OPERATOR OPT DESCRIPTION CARD
> ---- ---- ---- -------------------- -------- -------------------- ---------
> 8 . 9 root 1.00E+000
> 7 . 8 sort_partial_aggr_ro 1.00E+000
> 6 . 7 esp_exchange 1:4(hash2) 1.00E+000
> 5 . 6 sort_partial_aggr_le 1.00E+000
> 4 2 5 hybrid_hash_join 1.79E+006
> 3 . 4 esp_exchange 4(hash2):4(hash2) 1.00E+006
> . . 3 trafodion_scan SKEWSUBSTR 1.00E+006
> 1 . 2 esp_exchange 4(hash2):4(hash2) 1.00E+006
> . . 1 trafodion_scan OTHERTABLE 1.00E+006
> --- SQL operation complete.
> Notice that when the join predicate is of the form SUBSTR(b,1,9) =
> SUBSTR(d,1,9), vanilla hash repartitioning is chosen even though the values
> of column b show extreme skew. The problem is that with the SUBSTRs present,
> we use default distribution properties for the join predicate instead of
> leveraging our knowledge about the skew in column B.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)