Nemon Lou created HIVE-24902:
--------------------------------

             Summary: Incorrect result due to ReduceExpressionsRule
                 Key: HIVE-24902
                 URL: https://issues.apache.org/jira/browse/HIVE-24902
             Project: Hive
          Issue Type: Bug
          Components: CBO
    Affects Versions: 3.1.2, 4.0.0
            Reporter: Nemon Lou


The following sql returns only one record (20210308)but we expect two(20210308
20210309).

{code:sql}
select * from (
select 
                case when b.a=1
                                   then  
                                                        cast 
(from_unixtime(unix_timestamp(cast(20210309 as string),'yyyyMMdd') - 
86400,'yyyyMMdd') as bigint)
                                  else 
                                                  20210309 
           end 
as col
from 
(select stack(2,1,2) as (a))
 as b
) t 
where t.col is not null;
{code}

After debuging, i find the ReduceExpressionsRule changes expression in the 
wrong way.
Original expression:

{code:sql}
IS NOT NULL(CASE(=($0, 1), 
CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
 CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
_UTF-16LE'yyyyMMdd'), CAST(86400):BIGINT), _UTF-16LE'yyyyMMdd')):BIGINT, 
20210309))
{code}

After reducing expressions:
{code:sql}
CASE(=($0, 1), IS NOT 
NULL(CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
 CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
_UTF-16LE'yyyyMMdd'), CAST(86400):BIGINT), _UTF-16LE'yyyyMMdd')):BIGINT), true)
{code}

The query plan in main branch:
{code:sql}
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: _dummy_table
          Row Limit Per Split: 1
          Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
stats: COMPLETE
          Select Operator
            expressions: 2 (type: int), 1 (type: int), 2 (type: int)
            outputColumnNames: _col0, _col1, _col2
            Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE Column 
stats: COMPLETE
            UDTF Operator
              Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
Column stats: COMPLETE
              function name: stack
              Filter Operator
                predicate: COALESCE((col0 = 1),false) (type: boolean)
                Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
Column stats: COMPLETE
                Select Operator
                  expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
(20210309L) END (type: bigint)
                  outputColumnNames: _col0
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
                  ListSink

Time taken: 0.155 seconds, Fetched: 28 row(s)

{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to