[ 
https://issues.apache.org/jira/browse/TRAFODION-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944198#comment-15944198
 ] 

ASF GitHub Bot commented on TRAFODION-2552:
-------------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-trafodion/pull/1023


> Skew buster plan not chosen when join predicate involves SUBSTRs
> ----------------------------------------------------------------
>
>                 Key: TRAFODION-2552
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2552
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>    Affects Versions: 2.1-incubating, 2.2-incubating
>         Environment: All
>            Reporter: David Wayne Birdsall
>            Assignee: David Wayne Birdsall
>
> Skew-buster is a feature of Trafodion that modifies the partitioning function 
> on joins when skewed values are present. When we notice that there are a few 
> very frequent values among many low frequency values in a join column, we use 
> a different strategy for joining the frequent values than the non-frequent 
> ones.
> The following script reproduces the problem:
> ?section setup
> drop table if exists skewSubstr;
> create table skewSubstr (a int not null, b char(16), primary key (a))
>   salt using 4 partitions;
> upsert using load into skewSubstr
> select 
>   x1 + 10*x2 + 100*x3 + 1000*x4 + 10000*x5 + 100000*x6,
>   case when mod(x1 + 10*x2 + 100*x3 + 1000*x4 + 10000*x5 + 100000*x6,97) = 0 
> then 'askewvalue'
>        else char(x5+97) || char(mod(x4 + 3*x2 + 7*x6,26)+97) || char(x4+97) 
> || char(x1+97) || 
>             char(mod(2*x1 + 5*x5 + x6,26)+97) || char(mod(x1+x2+x3,26)+97) || 
> char(x6+97) ||
>             char(x2+97) || char(x3+97) || char(mod(x1-x2+x4+2*x6,26)+97) 
>        end
> -- the from clause below creates 1,000,000 rows, the cross product of
> -- 6 copies of { 0, ... 9 }
>   from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) T(x1)
> transpose 0,1,2,3,4,5,6,7,8,9 as x2
> transpose 0,1,2,3,4,5,6,7,8,9 as x3
> transpose 0,1,2,3,4,5,6,7,8,9 as x4
> transpose 0,1,2,3,4,5,6,7,8,9 as x5
> transpose 0,1,2,3,4,5,6,7,8,9 as x6;
> update statistics for table skewSubtr on every column;
> create table otherTable(c int not null, d char(16), primary key (c))
>   salt using 4 partitions;
> upsert using load into otherTable
>   select a, case when b = 'askewvalue' then substr(b,1,10) || 
> char(mod(a,26)+97) else b end
>     from skewSubstr;
> update statistics for table otherTable on every column;
> ?section doit
> cqd SKEW_SENSITIVITY_THRESHOLD '0.0';
> cqd SKEW_ROWCOUNT_THRESHOLD '10';
> prepare goodQuery from select count(*) from skewSubstr join otherTable
>   on b = d;
> -- notice in the explain, that the descriptions on the esp_exchanges are
> -- (h2-ud) and (h2-br), which indicates that skew-buster is operative
> explain options 'f' goodQuery;
> prepare badQuery from select count(*) from skewSubstr join otherTable
>   on substr(b,1,9) = substr(d,1,9);
> -- notice in the explain, that the descriptions on the esp_exchanges are
> -- (hash2), which indicates that skew-buster is NOT operative
> explain options 'f' badQuery;
> To see the problem, look at the output of the EXPLAINs:
> >>
> >>cqd SKEW_SENSITIVITY_THRESHOLD '0.0';
> --- SQL operation complete.
> >>cqd SKEW_ROWCOUNT_THRESHOLD '10';
> --- SQL operation complete.
> >>
> >>prepare goodQuery from select count(*) from skewSubstr join otherTable
> +>  on b = d;
> --- SQL command prepared.
> >>
> >>-- notice in the explain, that the descriptions on the esp_exchanges are
> >>-- (h2-ud) and (h2-br), which indicates that skew-buster is operative
> >>explain options 'f' goodQuery;
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
> ---- ---- ---- --------------------  --------  --------------------  ---------
> 8    .    9    root                                                  1.00E+000
> 7    .    8    sort_partial_aggr_ro                                  1.00E+000
> 6    .    7    esp_exchange                    1:4(h2-ud)            1.00E+000
> 5    .    6    sort_partial_aggr_le                                  1.00E+000
> 4    2    5    hybrid_hash_join                                      1.01E+006
> 3    .    4    esp_exchange                    4(h2-ud):4(hash2)     1.00E+006
> .    .    3    trafodion_scan                  SKEWSUBSTR            1.00E+006
> 1    .    2    esp_exchange                    4(h2-br):4(hash2)     1.00E+006
> .    .    1    trafodion_scan                  OTHERTABLE            1.00E+006
> --- SQL operation complete.
> >>
> >>prepare badQuery from select count(*) from skewSubstr join otherTable
> +>  on substr(b,1,9) = substr(d,1,9);
> --- SQL command prepared.
> >>
> >>-- notice in the explain, that the descriptions on the esp_exchanges are
> >>-- (hash2), which indicates that skew-buster is NOT operative
> >>explain options 'f' badQuery;
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
> ---- ---- ---- --------------------  --------  --------------------  ---------
> 8    .    9    root                                                  1.00E+000
> 7    .    8    sort_partial_aggr_ro                                  1.00E+000
> 6    .    7    esp_exchange                    1:4(hash2)            1.00E+000
> 5    .    6    sort_partial_aggr_le                                  1.00E+000
> 4    2    5    hybrid_hash_join                                      1.79E+006
> 3    .    4    esp_exchange                    4(hash2):4(hash2)     1.00E+006
> .    .    3    trafodion_scan                  SKEWSUBSTR            1.00E+006
> 1    .    2    esp_exchange                    4(hash2):4(hash2)     1.00E+006
> .    .    1    trafodion_scan                  OTHERTABLE            1.00E+006
> --- SQL operation complete.
> Notice that when the join predicate is of the form SUBSTR(b,1,9) = 
> SUBSTR(d,1,9), vanilla hash repartitioning is chosen even though the values 
> of column b show extreme skew. The problem is that with the SUBSTRs present, 
> we use default distribution properties for the join predicate instead of 
> leveraging our knowledge about the skew in column B.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to