Gopal V created HIVE-6247:
-----------------------------
Summary: select count(distinct) should be MRR in Tez
Key: HIVE-6247
URL: https://issues.apache.org/jira/browse/HIVE-6247
Project: Hive
Issue Type: Bug
Components: Tez
Affects Versions: 0.13.0
Reporter: Gopal V
Assignee: Gunther Hagleitner
The MR query plan for "select count(distinct) " fires off multiple reducers,
with a local work task to perform final aggregation.
The Tez version fires off exactly 1 reducer for the entire data-set which
chokes and dies/slows down massively.
To reproduce on a TPC-DS database (meaningless query)
{code}
select count(distinct ss_net_profit) from store_sales ss join store s on
ss.ss_store_sk = s.s_store_sk;
{code}
This spins up Map 1, Map 2 (for the dim table + fact table) & Reducer 1 which
is always "0/1".
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)