Taking a look at the explain, the first stage of both queries are
identical.  The only differences are in the reduce in the 2nd stage.
I'm not sure if this helps or not.
Thanks.

Working Reduce:
      Reduce Operator Tree:
        Join Operator
          condition map:
               Inner Join 0 to 1
               Left Outer Join0 to 2
          condition expressions:
            0 {VALUE.0} {VALUE.1}
            1 {VALUE.0} {VALUE.1}
            2 {VALUE.0} {VALUE.1}
          Filter Operator
            predicate:
                expr: ((1 >= '20090216') and (1 <= '20090217'))
                type: boolean
            Select Operator
              expressions:
                    expr: 3
                    type: string
                    expr: 5
                    type: bigint
              Limit
                File Output Operator
                  compressed: false
                  GlobalTableId: 0
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

Failed Reduce:
     Reduce Operator Tree:
        Join Operator
          condition map:
               Inner Join 0 to 1
               Left Outer Join0 to 2
          condition expressions:
            0 {VALUE.0} {VALUE.1}
            1 {VALUE.0} {VALUE.1}
            2 {VALUE.0} {VALUE.1}
          Filter Operator
            predicate:
                expr: ((1 >= '20090216') and (1 <= '20090217'))
                type: boolean
            Group By Operator
              aggregations:
                    expr: count(1)
              keys:
                    expr: 3
                    type: string
                    expr: 5
                    type: bigint
              mode: hash
              File Output Operator
                compressed: false
                GlobalTableId: 0
                table:
                    input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    name: binary_table



On Wed, Apr 22, 2009 at 6:43 PM, Matt Pestritto <m...@pestritto.com> wrote:

> Thanks.
> I tried that also earlier:
> FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference
> buyers
>
>
> On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <a...@cloudera.com> wrote:
>
>>
>> in the group by, try this instead:
>>
>> *group by m.description, buyers* limit 40 ;
>>
>> Matt Pestritto wrote:
>>
>> Hi - I'm having a problem with a query below.  When I try to run any
>> aggregate function on a column from the sub-query, the job fails.
>> The queries and output messages are below.
>>
>> Suggestions?
>>
>> thanks in advance.
>>
>> -- works:  2 map-reduces jobs.
>> select m.description, o_buyers.num as buyers
>> from clickstream_output o
>>   join merchandise m on (o.merchandise_id = m.merchandise_id)
>>   left outer join ( select o1.merchandise_id, count(distinct
>> o1.consumer_id) as num from clickstream_output o1
>>          where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>      on (o_buyers.merchandise_id = o.merchandise_id)
>> where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;
>>
>> Successful output: PROD7362, 3
>>
>> -- fails: 3 map-reduce jobs - 2nd reduce fails.
>> select  m.description, o_buyers.num as buyers*, count(1) as total*  --
>> sum or max(o_buyers.num) and removing from group by also fails.
>> from clickstream_output o
>>   join merchandise m on (o.merchandise_id = m.merchandise_id)
>>   left outer join ( select o1.merchandise_id, count(distinct
>> o1.consumer_id) as num from clickstream_output o1
>>          where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>      on (o_buyers.merchandise_id = o.merchandise_id)
>> where o.file_date >= '20090216' and o.file_date <= '20090217'
>> *group by m.description, o_buyers.num* limit 40 ;
>>
>>
>> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
>> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be 
>> cast to org.apache.hadoop.io.Text
>>
>>      at 
>> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
>>      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
>>      at 
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
>>
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
>> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be 
>> cast to org.apache.hadoop.io.Text
>>      at 
>> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)
>>
>>      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>>      at 
>> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
>>      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>>
>>      at 
>> org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
>>      at 
>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
>>      at 
>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>
>>      at 
>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>      at 
>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>      at 
>> org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)
>>
>>      at 
>> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
>>      at 
>> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)
>>
>>
>>
>>
>

Reply via email to