I'm using Hive 0.4.1 and Hadoop 0.20.3. Thanks guys I'll try out those suggestions.
On Tue, Feb 23, 2010 at 4:27 PM, Zheng Shao <[email protected]> wrote: > 1. Hive does not support "FROM a, b" kind of join. You can use "FROM a > JOIN b ON ..". > > 2. You might want to use the following simpler (and faster) query: > > SELECT report_name, m.col1 as report_run_date, m.col2 as user_name > FROM ( > SELECT report_name, max(struct(report_run_date, user_name)) as m > FROM report_history > ) tmp; > > The struct udf is added recently. It's only in trunk. > Zheng > > On Tue, Feb 23, 2010 at 1:14 PM, Tom Nichols <[email protected]> wrote: >> Really? It looked to me like Hive was complaining about the comma at >> the end of "from report_history rh," >> >> SELECT rh.user_name, rh.report_name, rh.report_run_date >> FROM report_history rh, >> (SELECT max(report_run_date) as maxdate, report_name >> FROM report_history >> GROUP BY report_name) maxresults >> WHERE rh.report_name = maxresults.report_name >> AND rh.report_run_date= maxresults.maxdate; >> >> FAILED: Parse Error: line 2:20 mismatched input ',' expecting EOF >> >> I was able to perform the same task by using an intermediate table for >> the inner query but that obviously complicates things and I'd suspect >> makes it take longer. >> >> Thanks again for the help! >> -Tom >> >> >> On Tue, Feb 23, 2010 at 4:00 PM, Carl Steinbach <[email protected]> wrote: >>> Tom, >>> >>> The example in the link will compile and execute on Hive without >>> modification. >>> >>> Carl >>> >>> On Tue, Feb 23, 2010 at 12:46 PM, Tom Nichols <[email protected]> wrote: >>>> >>>> Hi Hive users, >>>> >>>> I've got a somewhat convoluted query which I'm wondering how I would >>>> translate it to hive... It is similar to the first FAQ example here: >>>> http://www.techonthenet.com/sql/max.php >>>> So following the example above, I could easily get the last run date >>>> for each report, but how would I get the _user_ who last ran the >>>> report? It involves a join based on a subquery. Conceptually this >>>> seems like it wouldn't be difficult because I can carry all of the >>>> columns while searching for a 'max' of one column in a M-R job... it >>>> just doesn't express itself in SQL quite the same way. >>>> >>>> Is there a way to put the result of my inner select directly into >>>> another Hive table and then my outer select would just join on that >>>> table? >>>> >>>> Thanks. >>> >>> >> > > > > -- > Yours, > Zheng >
