Zoltan Chovan created HIVE-17334:
------------------------------------
Summary: Problem in interpreting structure columns in view.
Key: HIVE-17334
URL: https://issues.apache.org/jira/browse/HIVE-17334
Project: Hive
Issue Type: Improvement
Reporter: Zoltan Chovan
Priority: Minor
Reproduction steps, to set up the env:
{code:java}
drop view if exists test_db.view_a;
drop view if exists test_db.view_b;
drop table if exists test_db.table_a;
create database if not exists test_db;
create external table if not exists test_db.table_a (
`id` bigint,
`users` array<struct<user_id:int,counter:int>>
)
partitioned by (p_date int)
stored as parquet
location '/user/hive/';
{code}
With this, he following scenario will fail:
{code:java}
select
t.p_date,
t.id,
p.user_id,
sum(p.counter) as counter,
sum(sum(p.counter)) over
(partition by t.id) as total_actions
from test_db.table_a as t
lateral view explode(t.users) users_two as p
group by t.p_date, t.id, p.user_id;
create view if not exists test_db.view_a as
select
t.p_date,
t.id,
p.user_id,
sum(p.counter) as counter,
sum(sum(p.counter)) over
(partition by t.id) as total_actions
from test_db.table_a as t
lateral view explode(t.users) users_two as p
group by t.p_date, t.id, p.user_id;
select *
from test_db.view_a
where p_date = 20170711;
{code}
The following scenario will succeed:
{code:java}
create view if not exists test_db.view_b as
select
base.*,
sum(base.counter) over
(partition by base.id) as total_actions
from (
select
t.p_date,
t.id,
p.user_id,
sum(p.counter) as counter
from test_db.table_a as t
lateral view explode(t.users) users_two as p
group by t.p_date, t.id, p.user_id
) as base;
select *
from test_db.view_b
where p_date = 20170711;
{code}
As you can see the only difference is that the addition of "sum(sum(p.counter))
over (partition by t.id) as total_actions"
If the view is created as follows then the query that was breaking works.
{code:java}
create view if not exists test_db.view_c as
select
dt.p_date,
dt.id,
users_two.p.user_id,
sum(p.counter) as counter,
sum(sum(p.counter)) over
(partition by dt.id) as total_actions
from test_db.table_a as dt
lateral view explode(dt.users) users_two as p
group by dt.p_date, dt.id, users_two.p.user_id;
select *
from test_db.view_c
where p_date = 20170711;
{code}
The workaround is to use the table name along with the column name e.g
users_two.p.user_id. Please advise if this is a bug and if it could be fixed.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)