[jira] Commented: (HIVE-2056) Generate single MR job for multi groupby query.

Amareshwari Sriramadasu (JIRA) Tue, 15 Mar 2011 09:29:55 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006995#comment-13006995
 ]


Amareshwari Sriramadasu commented on HIVE-2056:
-----------------------------------------------

Here is a request from one of our customers:

here is a real example of need to have multi group by with 1 M/R. If
you look at the query below, we have two aggregates being generated out of 
single fact table. The 1st aggregate
generates unique count by date and the 2nd one generates unique count by date 
and gender. We have lot of
these aggregates to be built. We would like this to be done in 1 M/R job as 
against three below. Is it possible to do
this in Hive?

// created two intermediate tables

hive> create table test_1 (dt string, bc_cnt bigint);

OK

Time taken: 9.004 seconds

hive> create table test_2 (dt string, gender string, bc_cnt bigint);

OK



// multi group by in insert statement



hive> from fact_table f

    > insert overwrite table test_1 select dt, count(distinct id) group by dt

    > insert overwrite table test_2 select dt,gender,count(distinct id) group 
by dt,gender;

Total MapReduce jobs = 3

Launching Job 1 out of 3

Number of reduce tasks not specified. Estimated from input data size: 999

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapred.reduce.tasks=<number>



Thanks

Sudhish



> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-2056) Generate single MR job for multi groupby query.

Reply via email to