Feng Yuan created HIVE-11930:
--------------------------------
Summary: how to prevent ppd the topN(a) udf predication in where
clause?
Key: HIVE-11930
URL: https://issues.apache.org/jira/browse/HIVE-11930
Project: Hive
Issue Type: Bug
Components: Hive
Affects Versions: 0.14.0
Reporter: Feng Yuan
Priority: Blocker
Fix For: 0.14.1
select
a.state_date,a.customer,a.taskid,a.step_id,a.exit_title,a.pv,top1000(a.only_id)
from
( select
t1.state_date,t1.customer,t1.taskid,t1.step_id,t1.exit_title,t1.pv,t1.only_id
from
( select t11.state_date,
t11.customer,
t11.taskid,
t11.step_id,
t11.exit_title,
t11.pv,
concat(t11.customer,t11.taskid,t11.step_id) as
only_id
from
( select
state_date,customer,taskid,step_id,exit_title,count(*) as pv
from bdi_fact2.mid_url_step
where exit_url!='-1'
and exit_title !='-1'
and l_date='2015-08-31'
group by
state_date,customer,taskid,step_id,exit_title
)t11
)t1
order by t1.only_id,t1.pv desc
)a
where a.customer='Cdianyingwang'
and a.taskid='33'
and a.step_id='0'
and top1000(a.only_id)<=10;
in above example:
outer top1000(a.only_id)<=10;will ppd to:
stage 1:
( select t11.state_date,
t11.customer,
t11.taskid,
t11.step_id,
t11.exit_title,
t11.pv,
concat(t11.customer,t11.taskid,t11.step_id) as
only_id
from
( select
state_date,customer,taskid,step_id,exit_title,count(*) as pv
from bdi_fact2.mid_url_step
where exit_url!='-1'
and exit_title !='-1'
and l_date='2015-08-31'
group by
state_date,customer,taskid,step_id,exit_title
)t11
)t1
and this stage have 2 reduce,so you can see this will output 20 records,
upon to outer stage,the final results is exactly this 20 records.
so i want to know is there any way to hint this topN udf predication not to ppd?
Thanks
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)