[ 
https://issues.apache.org/jira/browse/HIVE-24775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-24775:
-------------------------------------


> Incorrect null handling when rebuilding Materialized view incrementally
> -----------------------------------------------------------------------
>
>                 Key: HIVE-24775
>                 URL: https://issues.apache.org/jira/browse/HIVE-24775
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>
> {code}
> CREATE TABLE t1 (a int, b varchar(256), c decimal(10,2), d int) STORED AS orc 
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO t1 VALUES
>  (NULL, 'null_value', 100.77, 7),
>  (1, 'calvin', 978.76, 3),
>  (1, 'charlie', 9.8, 1);
> CREATE MATERIALIZED VIEW mat1 TBLPROPERTIES ('transactional'='true') AS
>   SELECT a, b, sum(d)
>   FROM t1
>   WHERE c > 10.0
>   GROUP BY a, b;
> INSERT INTO t1 VALUES
>  (NULL, 'null_value', 100.88, 8),
>  (1, 'charlie', 15.8, 1);
> ALTER MATERIALIZED VIEW mat1 REBUILD;
> SELECT * FROM mat1
> ORDER BY a, b;
> {code}
> View contains:
> {code}
> 1     calvin  3
> 1     charlie 1
> NULL  null_value      8
> NULL  null_value      7
> {code}
> but it should contain:
> {code}
> 1     calvin  3
> 1     charlie 1
> NULL  null_value      15
> {code}
> Rows with aggregate key columns having NULL values are not aggregated because 
> incremental materialized view rebuild plan is altered by 
> [applyPreJoinOrderingTransforms|https://github.com/apache/hive/blob/76732ad27e139fbdef25b820a07cf35934771083/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L1975]:
>   IS NOT NULL filter added for each of these columns on top of the view scan 
> when joining with the branch pulls the rows inserted after the last rebuild:
> {code}
> HiveProject($f0=[$3], $f1=[$4], $f2=[CASE(AND(IS NULL($0), IS NULL($1)), $5, 
> +($5, $2))])
>   HiveFilter(condition=[OR(AND(IS NULL($0), IS NULL($1)), AND(=($0, $3), 
> =($1, $4)))])
>     HiveJoin(condition=[AND(=($0, $3), =($1, $4))], joinType=[right], 
> algorithm=[none], cost=[not available])
>       HiveProject(a=[$0], b=[$1], _c2=[$2])
>         HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($1))])
>           HiveTableScan(table=[[default, mat1]], table:alias=[default.mat1])
>       HiveProject(a=[$0], b=[$1], $f2=[$2])
>         HiveAggregate(group=[{0, 1}], agg#0=[sum($3)])
>           HiveFilter(condition=[AND(<(1, $6.writeid), >($2, 10))])
>             HiveTableScan(table=[[default, t1]], table:alias=[t1])
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to