Hi Jeff,

Thanks for the suggestion. That's a good idea.
I just added that to http://wiki.apache.org/hadoop/Hive/FAQ


Zheng

On Wed, Apr 22, 2009 at 8:22 PM, Jeff Hammerbacher <ham...@cloudera.com>wrote:

> Hey Zheng,
>
> Thanks for the insight. Perhaps these sorts of quirks could be added to the
> documentation on the wiki?
>
> Thanks,
> Jeff
>
>
> On Wed, Apr 22, 2009 at 6:35 PM, Zheng Shao <zsh...@gmail.com> wrote:
>
>> Hi Matt,
>>
>> There is a known bug with JOIN -  all output columns from JOIN will become
>> STRING.
>> The stacktrace shows this is exactly because of that bug.
>>
>> The workaround right now is to replace all columns with CAST(xxx as
>> STRING), if the column type is not a STRING.
>>
>>
>> select  m.description, o_buyers.num as buyers*, count(1) as total*
>> from clickstream_output o
>>   join merchandise m on (CAST(o.merchandise_id AS STRING) =
>> CAST(m.merchandise_id AS STRING))
>>   left outer join ( select CAST(o1.merchandise_id AS STRING) as
>> merchandise_id, CAST(count(distinct o1.consumer_id) AS STRING) as num from
>> clickstream_output o1
>>          where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>> and o1.event_id = 'buy' group by CAST(o1.merchandise_id AS STRING) )
>> o_buyers
>>      on (CAST(o_buyers.merchandise_id AS STRING) = CAST(o.merchandise_id
>> AS STRING))
>> where o.file_date >= '20090216' and o.file_date <= '20090217'
>> *group by m.description, o_buyers.num* limit 40 ;
>>
>>
>> It will be fixed by https://issues.apache.org/jira/browse/HIVE-405.
>>
>> Zheng
>>
>>
>> On Wed, Apr 22, 2009 at 5:09 PM, Ashish Thusoo <athu...@facebook.com>wrote:
>>
>>> Can you do an explain <query> and send us the plan.
>>>
>>> Another thing that you may try is to put the entire subquery in the from
>>> clause and then do an aggregation on it..
>>>
>>> i.e.
>>>
>>> select ..., count(1)
>>> from (select ...
>>>         from clickstream_output o join .....) t
>>> group by t.description, t.num limit 40;
>>>
>>> Also are you using the 0.3.0 release candidate?
>>>
>>> Ashish
>>>
>>> ________________________________________
>>> From: Matt Pestritto [m...@pestritto.com]
>>> Sent: Wednesday, April 22, 2009 3:53 PM
>>> To: hive-user@hadoop.apache.org
>>> Subject: Re: Aggregrate Query Fails.
>>>
>>> So the only change was to remove the column alias correct?
>>> Still no luck.  Same result.
>>>
>>> On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <pcha...@facebook.com
>>> <mailto:pcha...@facebook.com>> wrote:
>>> That is strange... Does below also fail?
>>>
>>> select  m.description, o_buyers.num , count(1) as total
>>> from clickstream_output o
>>>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>>>  left outer join ( select o1.merchandise_id, count(distinct
>>> o1.consumer_id) as num from clickstream_output o1
>>>         where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>>     on (o_buyers.merchandise_id = o.merchandise_id)
>>> where o.file_date >= '20090216' and o.file_date <= '20090217'
>>> group by m.description, o_buyers.num limit 40 ;
>>>
>>>
>>> ________________________________
>>> From: Matt Pestritto <m...@pestritto.com<http://m...@pestritto.com>>
>>> Reply-To: <hive-user@hadoop.apache.org<http://hive-user@
>>> hadoop.apache.org>>
>>> Date: Wed, 22 Apr 2009 15:43:40 -0700
>>> To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org>>
>>> Subject: Re: Aggregrate Query Fails.
>>>
>>>
>>> Thanks.
>>> I tried that also earlier:
>>> FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference
>>> buyers
>>>
>>> On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <a...@cloudera.com<
>>> http://a...@cloudera.com>> wrote:
>>>
>>>
>>> in the group by, try this instead:
>>>
>>> group by m.description, buyers limit 40 ;
>>>
>>> Matt Pestritto wrote:
>>> Hi - I'm having a problem with a query below.  When I try to run any
>>> aggregate function on a column from the sub-query, the job fails.
>>> The queries and output messages are below.
>>>
>>> Suggestions?
>>>
>>> thanks in advance.
>>>
>>> -- works:  2 map-reduces jobs.
>>> select m.description, o_buyers.num as buyers
>>> from clickstream_output o
>>>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>>>  left outer join ( select o1.merchandise_id, count(distinct
>>> o1.consumer_id) as num from clickstream_output o1
>>>         where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>>     on (o_buyers.merchandise_id = o.merchandise_id)
>>> where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;
>>>
>>> Successful output: PROD7362, 3
>>>
>>> -- fails: 3 map-reduce jobs - 2nd reduce fails.
>>> select  m.description, o_buyers.num as buyers, count(1) as total  -- sum
>>> or max(o_buyers.num) and removing from group by also fails.
>>> from clickstream_output o
>>>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>>>  left outer join ( select o1.merchandise_id, count(distinct
>>> o1.consumer_id) as num from clickstream_output o1
>>>         where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>>     on (o_buyers.merchandise_id = o.merchandise_id)
>>> where o.file_date >= '20090216' and o.file_date <= '20090217'
>>>  group by m.description, o_buyers.num limit 40 ;
>>>
>>>
>>>
>>> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException:
>>> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
>>> cast to org.apache.hadoop.io.Text
>>>
>>>  at
>>> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
>>>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
>>>  at
>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
>>>
>>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
>>> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
>>> cast to org.apache.hadoop.io.Text
>>>  at
>>> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)
>>>
>>>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>>>  at
>>> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
>>>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>>>
>>>  at
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
>>>  at
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
>>>  at
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>>
>>>  at
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>>  at
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>>  at
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)
>>>
>>>  at
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
>>>  at
>>> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>
>


-- 
Yours,
Zheng

Reply via email to