That is strange... Does below also fail? select m.description, o_buyers.num , count(1) as total from clickstream_output o join merchandise m on (o.merchandise_id = m.merchandise_id) left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1 where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers on (o_buyers.merchandise_id = o.merchandise_id) where o.file_date >= '20090216' and o.file_date <= '20090217' group by m.description, o_buyers.num limit 40 ;
________________________________ From: Matt Pestritto <m...@pestritto.com> Reply-To: <hive-user@hadoop.apache.org> Date: Wed, 22 Apr 2009 15:43:40 -0700 To: <hive-user@hadoop.apache.org> Subject: Re: Aggregrate Query Fails. Thanks. I tried that also earlier: FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference buyers On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <a...@cloudera.com> wrote: in the group by, try this instead: group by m.description, buyers limit 40 ; Matt Pestritto wrote: Hi - I'm having a problem with a query below. When I try to run any aggregate function on a column from the sub-query, the job fails. The queries and output messages are below. Suggestions? thanks in advance. -- works: 2 map-reduces jobs. select m.description, o_buyers.num as buyers from clickstream_output o join merchandise m on (o.merchandise_id = m.merchandise_id) left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1 where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers on (o_buyers.merchandise_id = o.merchandise_id) where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40; Successful output: PROD7362, 3 -- fails: 3 map-reduce jobs - 2nd reduce fails. select m.description, o_buyers.num as buyers, count(1) as total -- sum or max(o_buyers.num) and removing from group by also fails. from clickstream_output o join merchandise m on (o.merchandise_id = m.merchandise_id) left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1 where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers on (o_buyers.merchandise_id = o.merchandise_id) where o.file_date >= '20090216' and o.file_date <= '20090217' group by m.description, o_buyers.num limit 40 ; java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)