[
https://issues.apache.org/jira/browse/HIVE-24775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Krisztian Kasa reassigned HIVE-24775:
-------------------------------------
> Incorrect null handling when rebuilding Materialized view incrementally
> -----------------------------------------------------------------------
>
> Key: HIVE-24775
> URL: https://issues.apache.org/jira/browse/HIVE-24775
> Project: Hive
> Issue Type: Bug
> Reporter: Krisztian Kasa
> Assignee: Krisztian Kasa
> Priority: Major
>
> {code}
> CREATE TABLE t1 (a int, b varchar(256), c decimal(10,2), d int) STORED AS orc
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO t1 VALUES
> (NULL, 'null_value', 100.77, 7),
> (1, 'calvin', 978.76, 3),
> (1, 'charlie', 9.8, 1);
> CREATE MATERIALIZED VIEW mat1 TBLPROPERTIES ('transactional'='true') AS
> SELECT a, b, sum(d)
> FROM t1
> WHERE c > 10.0
> GROUP BY a, b;
> INSERT INTO t1 VALUES
> (NULL, 'null_value', 100.88, 8),
> (1, 'charlie', 15.8, 1);
> ALTER MATERIALIZED VIEW mat1 REBUILD;
> SELECT * FROM mat1
> ORDER BY a, b;
> {code}
> View contains:
> {code}
> 1 calvin 3
> 1 charlie 1
> NULL null_value 8
> NULL null_value 7
> {code}
> but it should contain:
> {code}
> 1 calvin 3
> 1 charlie 1
> NULL null_value 15
> {code}
> Rows with aggregate key columns having NULL values are not aggregated because
> incremental materialized view rebuild plan is altered by
> [applyPreJoinOrderingTransforms|https://github.com/apache/hive/blob/76732ad27e139fbdef25b820a07cf35934771083/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L1975]:
> IS NOT NULL filter added for each of these columns on top of the view scan
> when joining with the branch pulls the rows inserted after the last rebuild:
> {code}
> HiveProject($f0=[$3], $f1=[$4], $f2=[CASE(AND(IS NULL($0), IS NULL($1)), $5,
> +($5, $2))])
> HiveFilter(condition=[OR(AND(IS NULL($0), IS NULL($1)), AND(=($0, $3),
> =($1, $4)))])
> HiveJoin(condition=[AND(=($0, $3), =($1, $4))], joinType=[right],
> algorithm=[none], cost=[not available])
> HiveProject(a=[$0], b=[$1], _c2=[$2])
> HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($1))])
> HiveTableScan(table=[[default, mat1]], table:alias=[default.mat1])
> HiveProject(a=[$0], b=[$1], $f2=[$2])
> HiveAggregate(group=[{0, 1}], agg#0=[sum($3)])
> HiveFilter(condition=[AND(<(1, $6.writeid), >($2, 10))])
> HiveTableScan(table=[[default, t1]], table:alias=[t1])
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)