Re: selecting data where max is found

Tom Nichols Tue, 23 Feb 2010 13:42:31 -0800

I'm using Hive 0.4.1 and Hadoop 0.20.3.  Thanks guys I'll try out
those suggestions.


On Tue, Feb 23, 2010 at 4:27 PM, Zheng Shao <[email protected]> wrote:
> 1. Hive does not support "FROM a, b" kind of join. You can use "FROM a
> JOIN b ON ..".
>
> 2. You might want to use the following simpler (and faster) query:
>
> SELECT report_name, m.col1 as report_run_date, m.col2 as user_name
> FROM (
>  SELECT  report_name, max(struct(report_run_date, user_name)) as m
>  FROM report_history
> ) tmp;
>
> The struct udf is added recently. It's only in trunk.
> Zheng
>
> On Tue, Feb 23, 2010 at 1:14 PM, Tom Nichols <[email protected]> wrote:
>> Really?  It looked to me like Hive was complaining about the comma at
>> the end of "from report_history rh,"
>>
>> SELECT rh.user_name, rh.report_name, rh.report_run_date
>> FROM report_history rh,
>>   (SELECT max(report_run_date) as maxdate, report_name
>>     FROM report_history
>>     GROUP BY report_name) maxresults
>> WHERE rh.report_name = maxresults.report_name
>> AND rh.report_run_date= maxresults.maxdate;
>>
>> FAILED: Parse Error: line 2:20 mismatched input ',' expecting EOF
>>
>> I was able to perform the same task by using an intermediate table for
>> the inner query but that obviously complicates things and I'd suspect
>> makes it take longer.
>>
>> Thanks again for the help!
>> -Tom
>>
>>
>> On Tue, Feb 23, 2010 at 4:00 PM, Carl Steinbach <[email protected]> wrote:
>>> Tom,
>>>
>>> The example in the link will compile and execute on Hive without
>>> modification.
>>>
>>> Carl
>>>
>>> On Tue, Feb 23, 2010 at 12:46 PM, Tom Nichols <[email protected]> wrote:
>>>>
>>>> Hi Hive users,
>>>>
>>>> I've got a somewhat convoluted query which I'm wondering how I would
>>>> translate it to hive...  It is similar to the first FAQ example here:
>>>> http://www.techonthenet.com/sql/max.php
>>>> So following the example above, I could easily get the last run date
>>>> for each report, but how would I get the _user_ who last ran the
>>>> report?  It involves a join based on a subquery.  Conceptually this
>>>> seems like it wouldn't be difficult because I can carry all of the
>>>> columns while searching for a 'max' of one column in a M-R job...  it
>>>> just doesn't express itself in SQL quite the same way.
>>>>
>>>> Is there a way to put the result of my inner select directly into
>>>> another Hive table and then my outer select would just join on that
>>>> table?
>>>>
>>>> Thanks.
>>>
>>>
>>
>
>
>
> --
> Yours,
> Zheng
>

Re: selecting data where max is found

Reply via email to