Hi Jeff, Thanks for the suggestion. That's a good idea. I just added that to http://wiki.apache.org/hadoop/Hive/FAQ
Zheng On Wed, Apr 22, 2009 at 8:22 PM, Jeff Hammerbacher <ham...@cloudera.com>wrote: > Hey Zheng, > > Thanks for the insight. Perhaps these sorts of quirks could be added to the > documentation on the wiki? > > Thanks, > Jeff > > > On Wed, Apr 22, 2009 at 6:35 PM, Zheng Shao <zsh...@gmail.com> wrote: > >> Hi Matt, >> >> There is a known bug with JOIN - all output columns from JOIN will become >> STRING. >> The stacktrace shows this is exactly because of that bug. >> >> The workaround right now is to replace all columns with CAST(xxx as >> STRING), if the column type is not a STRING. >> >> >> select m.description, o_buyers.num as buyers*, count(1) as total* >> from clickstream_output o >> join merchandise m on (CAST(o.merchandise_id AS STRING) = >> CAST(m.merchandise_id AS STRING)) >> left outer join ( select CAST(o1.merchandise_id AS STRING) as >> merchandise_id, CAST(count(distinct o1.consumer_id) AS STRING) as num from >> clickstream_output o1 >> where o1.file_date >= '20090216' and o1.file_date <= '20090217' >> and o1.event_id = 'buy' group by CAST(o1.merchandise_id AS STRING) ) >> o_buyers >> on (CAST(o_buyers.merchandise_id AS STRING) = CAST(o.merchandise_id >> AS STRING)) >> where o.file_date >= '20090216' and o.file_date <= '20090217' >> *group by m.description, o_buyers.num* limit 40 ; >> >> >> It will be fixed by https://issues.apache.org/jira/browse/HIVE-405. >> >> Zheng >> >> >> On Wed, Apr 22, 2009 at 5:09 PM, Ashish Thusoo <athu...@facebook.com>wrote: >> >>> Can you do an explain <query> and send us the plan. >>> >>> Another thing that you may try is to put the entire subquery in the from >>> clause and then do an aggregation on it.. >>> >>> i.e. >>> >>> select ..., count(1) >>> from (select ... >>> from clickstream_output o join .....) t >>> group by t.description, t.num limit 40; >>> >>> Also are you using the 0.3.0 release candidate? >>> >>> Ashish >>> >>> ________________________________________ >>> From: Matt Pestritto [m...@pestritto.com] >>> Sent: Wednesday, April 22, 2009 3:53 PM >>> To: hive-user@hadoop.apache.org >>> Subject: Re: Aggregrate Query Fails. >>> >>> So the only change was to remove the column alias correct? >>> Still no luck. Same result. >>> >>> On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <pcha...@facebook.com >>> <mailto:pcha...@facebook.com>> wrote: >>> That is strange... Does below also fail? >>> >>> select m.description, o_buyers.num , count(1) as total >>> from clickstream_output o >>> join merchandise m on (o.merchandise_id = m.merchandise_id) >>> left outer join ( select o1.merchandise_id, count(distinct >>> o1.consumer_id) as num from clickstream_output o1 >>> where o1.file_date >= '20090216' and o1.file_date <= '20090217' >>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers >>> on (o_buyers.merchandise_id = o.merchandise_id) >>> where o.file_date >= '20090216' and o.file_date <= '20090217' >>> group by m.description, o_buyers.num limit 40 ; >>> >>> >>> ________________________________ >>> From: Matt Pestritto <m...@pestritto.com<http://m...@pestritto.com>> >>> Reply-To: <hive-user@hadoop.apache.org<http://hive-user@ >>> hadoop.apache.org>> >>> Date: Wed, 22 Apr 2009 15:43:40 -0700 >>> To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org>> >>> Subject: Re: Aggregrate Query Fails. >>> >>> >>> Thanks. >>> I tried that also earlier: >>> FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference >>> buyers >>> >>> On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <a...@cloudera.com< >>> http://a...@cloudera.com>> wrote: >>> >>> >>> in the group by, try this instead: >>> >>> group by m.description, buyers limit 40 ; >>> >>> Matt Pestritto wrote: >>> Hi - I'm having a problem with a query below. When I try to run any >>> aggregate function on a column from the sub-query, the job fails. >>> The queries and output messages are below. >>> >>> Suggestions? >>> >>> thanks in advance. >>> >>> -- works: 2 map-reduces jobs. >>> select m.description, o_buyers.num as buyers >>> from clickstream_output o >>> join merchandise m on (o.merchandise_id = m.merchandise_id) >>> left outer join ( select o1.merchandise_id, count(distinct >>> o1.consumer_id) as num from clickstream_output o1 >>> where o1.file_date >= '20090216' and o1.file_date <= '20090217' >>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers >>> on (o_buyers.merchandise_id = o.merchandise_id) >>> where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40; >>> >>> Successful output: PROD7362, 3 >>> >>> -- fails: 3 map-reduce jobs - 2nd reduce fails. >>> select m.description, o_buyers.num as buyers, count(1) as total -- sum >>> or max(o_buyers.num) and removing from group by also fails. >>> from clickstream_output o >>> join merchandise m on (o.merchandise_id = m.merchandise_id) >>> left outer join ( select o1.merchandise_id, count(distinct >>> o1.consumer_id) as num from clickstream_output o1 >>> where o1.file_date >= '20090216' and o1.file_date <= '20090217' >>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers >>> on (o_buyers.merchandise_id = o.merchandise_id) >>> where o.file_date >= '20090216' and o.file_date <= '20090217' >>> group by m.description, o_buyers.num limit 40 ; >>> >>> >>> >>> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: >>> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be >>> cast to org.apache.hadoop.io.Text >>> >>> at >>> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172) >>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391) >>> at >>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122) >>> >>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: >>> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be >>> cast to org.apache.hadoop.io.Text >>> at >>> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489) >>> >>> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) >>> at >>> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69) >>> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273) >>> at >>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516) >>> at >>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) >>> at >>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) >>> at >>> org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528) >>> at >>> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143) >>> >>> >>> >>> >>> >>> >> >> >> -- >> Yours, >> Zheng >> > > -- Yours, Zheng