The documentation appears to state the following: "Multiple aggregations can be done at the same time, however, no two aggregations can have different DISTINCT columns .e.g while the following is possible "
" INSERT OVERWRITE TABLE pv_gender_agg SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(1), sum(DISTINCT pv_users.userid) FROM pv_users GROUP BY pv_users.gender; However, the following query is not allowed. We don't allow multiple DISTINCT expressions in the same query. INSERT OVERWRITE TABLE pv_gender_agg SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT pv_users.ip) FROM pv_users GROUP BY pv_users.gender;" http://wiki.apache.org/hadoop/Hive/LanguageManual/GroupBy Is there an effort underway to allow multiple DISTINCT expressions in the same query in the (near) future as well? Thanks & regards, Avram -----Original Message----- From: Amr Awadallah [mailto:[email protected]] Sent: Saturday, July 04, 2009 12:02 AM To: [email protected] Subject: Re: aggregations over multiple columns? Mike, This is a valid query, group by over multiple columns works in hive. -- amr Michael E. Driscoll wrote: > Hi HIVErs, > > I'm trying to perform the following aggregation query in HIVE, which > finds the largest purchase for all combinations of customer and store: > > SELECT customer, store, max(purchasePrice) > FROM transactions > GROUP BY customer, store > > If aggregation over multiple columns is not currently supported, how > might I reformulate this to work in HIVE, possibly via a simpler > series of queries? > > (I will post the exact error and reproducible code if it turns out > this query is valid). > > regards, > > Mike > > b: www.dataspora.com/blog > t: www.twitter.com/dataspora >
