RE: aggregations over multiple columns?

Avram Aelony Mon, 06 Jul 2009 09:18:36 -0700

The documentation appears to state the following:

"Multiple aggregations can be done at the same time, however, no two 
aggregations can have different DISTINCT columns .e.g while the following is 
possible "

"  INSERT OVERWRITE TABLE pv_gender_agg
  SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(1), 
sum(DISTINCT pv_users.userid)
  FROM pv_users
  GROUP BY pv_users.gender;
However, the following query is not allowed. We don't allow multiple DISTINCT 
expressions in the same query.

  INSERT OVERWRITE TABLE pv_gender_agg
  SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT 
pv_users.ip)
  FROM pv_users
  GROUP BY pv_users.gender;"

http://wiki.apache.org/hadoop/Hive/LanguageManual/GroupBy

Is there an effort underway to allow multiple DISTINCT expressions in the same 
query in the (near) future as well?

Thanks & regards,
Avram

-----Original Message-----
From: Amr Awadallah [mailto:[email protected]] 
Sent: Saturday, July 04, 2009 12:02 AM
To: [email protected]
Subject: Re: aggregations over multiple columns?

Mike,

  This is a valid query, group by over multiple columns works in hive.

-- amr

Michael E. Driscoll wrote:
> Hi HIVErs,
>
> I'm trying to perform the following aggregation query in HIVE, which
> finds the largest purchase for all combinations of customer and store:
>
>   SELECT customer, store, max(purchasePrice)
>   FROM transactions
>   GROUP BY customer, store
>
> If aggregation over multiple columns is not currently supported, how
> might I reformulate this to work in HIVE, possibly via a simpler
> series of queries?
>
> (I will post the exact error and reproducible code if it turns out
> this query is valid).
>
> regards,
>
> Mike
>
> b: www.dataspora.com/blog
> t: www.twitter.com/dataspora
>

RE: aggregations over multiple columns?

Reply via email to