Hey Zheng, Thanks for the insight. Perhaps these sorts of quirks could be added to the documentation on the wiki?
Thanks, Jeff On Wed, Apr 22, 2009 at 6:35 PM, Zheng Shao <zsh...@gmail.com> wrote: > Hi Matt, > > There is a known bug with JOIN - all output columns from JOIN will become > STRING. > The stacktrace shows this is exactly because of that bug. > > The workaround right now is to replace all columns with CAST(xxx as > STRING), if the column type is not a STRING. > > > select m.description, o_buyers.num as buyers*, count(1) as total* > from clickstream_output o > join merchandise m on (CAST(o.merchandise_id AS STRING) = > CAST(m.merchandise_id AS STRING)) > left outer join ( select CAST(o1.merchandise_id AS STRING) as > merchandise_id, CAST(count(distinct o1.consumer_id) AS STRING) as num from > clickstream_output o1 > where o1.file_date >= '20090216' and o1.file_date <= '20090217' > and o1.event_id = 'buy' group by CAST(o1.merchandise_id AS STRING) ) > o_buyers > on (CAST(o_buyers.merchandise_id AS STRING) = CAST(o.merchandise_id AS > STRING)) > where o.file_date >= '20090216' and o.file_date <= '20090217' > *group by m.description, o_buyers.num* limit 40 ; > > > It will be fixed by https://issues.apache.org/jira/browse/HIVE-405. > > Zheng > > > On Wed, Apr 22, 2009 at 5:09 PM, Ashish Thusoo <athu...@facebook.com>wrote: > >> Can you do an explain <query> and send us the plan. >> >> Another thing that you may try is to put the entire subquery in the from >> clause and then do an aggregation on it.. >> >> i.e. >> >> select ..., count(1) >> from (select ... >> from clickstream_output o join .....) t >> group by t.description, t.num limit 40; >> >> Also are you using the 0.3.0 release candidate? >> >> Ashish >> >> ________________________________________ >> From: Matt Pestritto [m...@pestritto.com] >> Sent: Wednesday, April 22, 2009 3:53 PM >> To: hive-user@hadoop.apache.org >> Subject: Re: Aggregrate Query Fails. >> >> So the only change was to remove the column alias correct? >> Still no luck. Same result. >> >> On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <pcha...@facebook.com >> <mailto:pcha...@facebook.com>> wrote: >> That is strange... Does below also fail? >> >> select m.description, o_buyers.num , count(1) as total >> from clickstream_output o >> join merchandise m on (o.merchandise_id = m.merchandise_id) >> left outer join ( select o1.merchandise_id, count(distinct >> o1.consumer_id) as num from clickstream_output o1 >> where o1.file_date >= '20090216' and o1.file_date <= '20090217' >> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers >> on (o_buyers.merchandise_id = o.merchandise_id) >> where o.file_date >= '20090216' and o.file_date <= '20090217' >> group by m.description, o_buyers.num limit 40 ; >> >> >> ________________________________ >> From: Matt Pestritto <m...@pestritto.com<http://m...@pestritto.com>> >> Reply-To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org >> >> >> Date: Wed, 22 Apr 2009 15:43:40 -0700 >> To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org>> >> Subject: Re: Aggregrate Query Fails. >> >> >> Thanks. >> I tried that also earlier: >> FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference >> buyers >> >> On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <a...@cloudera.com< >> http://a...@cloudera.com>> wrote: >> >> >> in the group by, try this instead: >> >> group by m.description, buyers limit 40 ; >> >> Matt Pestritto wrote: >> Hi - I'm having a problem with a query below. When I try to run any >> aggregate function on a column from the sub-query, the job fails. >> The queries and output messages are below. >> >> Suggestions? >> >> thanks in advance. >> >> -- works: 2 map-reduces jobs. >> select m.description, o_buyers.num as buyers >> from clickstream_output o >> join merchandise m on (o.merchandise_id = m.merchandise_id) >> left outer join ( select o1.merchandise_id, count(distinct >> o1.consumer_id) as num from clickstream_output o1 >> where o1.file_date >= '20090216' and o1.file_date <= '20090217' >> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers >> on (o_buyers.merchandise_id = o.merchandise_id) >> where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40; >> >> Successful output: PROD7362, 3 >> >> -- fails: 3 map-reduce jobs - 2nd reduce fails. >> select m.description, o_buyers.num as buyers, count(1) as total -- sum >> or max(o_buyers.num) and removing from group by also fails. >> from clickstream_output o >> join merchandise m on (o.merchandise_id = m.merchandise_id) >> left outer join ( select o1.merchandise_id, count(distinct >> o1.consumer_id) as num from clickstream_output o1 >> where o1.file_date >= '20090216' and o1.file_date <= '20090217' >> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers >> on (o_buyers.merchandise_id = o.merchandise_id) >> where o.file_date >= '20090216' and o.file_date <= '20090217' >> group by m.description, o_buyers.num limit 40 ; >> >> >> >> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: >> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be >> cast to org.apache.hadoop.io.Text >> >> at >> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172) >> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391) >> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122) >> >> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: >> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be >> cast to org.apache.hadoop.io.Text >> at >> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489) >> >> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) >> at >> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69) >> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) >> >> at >> org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273) >> at >> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516) >> at >> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) >> >> at >> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) >> at >> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) >> at >> org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546) >> >> at >> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528) >> at >> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143) >> >> >> >> >> >> > > > -- > Yours, > Zheng >