Krisztian Kasa created HIVE-28572:
-------------------------------------

             Summary: Support Distribute by and Cluster by clauses in CBO
                 Key: HIVE-28572
                 URL: https://issues.apache.org/jira/browse/HIVE-28572
             Project: Hive
          Issue Type: Improvement
      Security Level: Public (Viewable by anyone)
          Components: CBO
            Reporter: Krisztian Kasa
            Assignee: Krisztian Kasa


If a query has {{distribute by}} or {{cluster by}} clause CBO is turned off and 
only non-CBO optimizations are applied to the query plan.
One impact of not using CBO is that implicit type conversions are not added.
Example:
{code:java}
create table t1 (a string, b int);

insert into t1 values ('2014-03-14 10:10:12', 10);

select * from t1 where a between date_add('2014-03-14', -1) and '2014-03-14' 
distribute by a;
{code}
{code:java}
                TableScan
                  alias: t1
                  filterExpr: a BETWEEN DATE'2014-03-13' AND '2014-03-14' 
(type: boolean)
{code}
vs
{code:java}
select * from t1 where a between date_add('2014-03-14', -1) and '2014-03-14'
{code}
{code:java}
                TableScan
                  alias: t1
                  filterExpr: CAST( a AS DATE) BETWEEN DATE'2014-03-13' AND 
DATE'2014-03-14' (type: boolean)
{code}
Moreover, if vectorization is turned off the results of the above queries are 
different which leads to data corruption.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to