Gopal V created HIVE-6247:
-----------------------------

             Summary: select count(distinct) should be MRR in Tez
                 Key: HIVE-6247
                 URL: https://issues.apache.org/jira/browse/HIVE-6247
             Project: Hive
          Issue Type: Bug
          Components: Tez
    Affects Versions: 0.13.0
            Reporter: Gopal V
            Assignee: Gunther Hagleitner


The MR query plan for "select count(distinct) " fires off multiple reducers, 
with a local work task to perform final aggregation.

The Tez version fires off exactly 1 reducer for the entire data-set which 
chokes and dies/slows down massively.

To reproduce on a TPC-DS database (meaningless query)

{code}
select count(distinct ss_net_profit) from store_sales ss join store s on 
ss.ss_store_sk = s.s_store_sk;
{code}

This spins up Map 1, Map 2 (for the dim table + fact table) & Reducer 1 which 
is always "0/1".



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to