[
https://issues.apache.org/jira/browse/HIVE-26737?focusedWorklogId=827811&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-827811
]
ASF GitHub Bot logged work on HIVE-26737:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 22/Nov/22 00:01
Start Date: 22/Nov/22 00:01
Worklog Time Spent: 10m
Work Description: scarlin-cloudera commented on code in PR #3761:
URL: https://github.com/apache/hive/pull/3761#discussion_r1028622471
##########
ql/src/test/results/clientpositive/perf/tpcds30tb/tez/cbo_query1.q.out:
##########
@@ -19,8 +19,8 @@ HiveSortLimit(sort0=[$0], dir0=[ASC], fetch=[100])
HiveProject(s_store_sk=[$0])
HiveFilter(condition=[=($24, _UTF-16LE'NM')])
HiveTableScan(table=[[default, store]], table:alias=[store])
- HiveProject(_o__c0=[*(CAST(/($1, $2)):DECIMAL(21, 6), 1.2:DECIMAL(2,
1))], ctr_store_sk=[$0])
- HiveFilter(condition=[IS NOT NULL(CAST(/($1, $2)):DECIMAL(21, 6))])
+ HiveProject(ctr_store_sk=[$0], CAST=[CAST(*(CAST(/($1, $2)):DECIMAL(21,
6), 1.2:DECIMAL(2, 1))):DECIMAL(24, 7)])
+ HiveFilter(condition=[IS NOT NULL(CAST(*(CAST(/($1, $2)):DECIMAL(21,
6), 1.2:DECIMAL(2, 1))):DECIMAL(24, 7))])
Review Comment:
Good catch!
Looks like I missed this change, I thought I had only trivial changes.
After looking at this, there was indeed a change of behavior in that it was
looking at multiple aggregates in the RelNode stack for "group by" statements
when it should only look at the first aggregate. I made the code change and
this regression went away.
Issue Time Tracking
-------------------
Worklog Id: (was: 827811)
Time Spent: 1.5h (was: 1h 20m)
> Subquery returning wrong results when database has materialized views
> ---------------------------------------------------------------------
>
> Key: HIVE-26737
> URL: https://issues.apache.org/jira/browse/HIVE-26737
> Project: Hive
> Issue Type: Bug
> Components: HiveServer2
> Reporter: Steve Carlin
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> When HS2 has materialized views in its registry, subqueries with correlated
> variables may return wrong results.
> An example of this:
>
> {code:java}
> CREATE TABLE t_test1(
> id int,
> int_col int,
> year int,
> month int
> );
> CREATE TABLE t_test2(
> id int,
> int_col int,
> year int,
> month int
> );
> CREATE TABLE dummy (
> id int
> ) stored as orc TBLPROPERTIES ('transactional'='true');
> CREATE MATERIALIZED VIEW need_a_mat_view_in_registry AS
> SELECT * FROM dummy where id > 5;
> INSERT INTO t_test1 VALUES (1, 1, 2009, 1), (10,0, 2009, 1);
> INSERT INTO t_test2 VALUES (1, 1, 2009, 1);
> select id, int_col, year, month from t_test1 s where s.int_col = (select
> count(*) from t_test2 t where s.id = t.id) order by id;
> {code}
> The select statement should produce 2 rows, but it is only producing one.
> The CBO plan produced has an inner join instead of a left join.
> {code:java}
> HiveSortLimit(sort0=[$0], dir0=[ASC])
> HiveProject(id=[$0], int_col=[$1], year=[$2], month=[$3])
> HiveJoin(condition=[AND(=($0, $5), =($4, $6))], joinType=[inner],
> algorithm=[none], cost=[not available])
> HiveProject(id=[$0], int_col=[$1], year=[$2], month=[$3],
> CAST=[CAST($1):BIGINT])
> HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT
> NULL(CAST($1):BIGINT))])
> HiveTableScan(table=[[default, t_test1]], table:alias=[s])
> HiveProject(id=[$0], $f1=[$1])
> HiveFilter(condition=[IS NOT NULL($1)])
> HiveAggregate(group=[{0}], agg#0=[count()])
> HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[default, t_test2]], table:alias=[t]){code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)