1. Hive does not support "FROM a, b" kind of join. You can use "FROM a JOIN b ON ..".
2. You might want to use the following simpler (and faster) query: SELECT report_name, m.col1 as report_run_date, m.col2 as user_name FROM ( SELECT report_name, max(struct(report_run_date, user_name)) as m FROM report_history ) tmp; The struct udf is added recently. It's only in trunk. Zheng On Tue, Feb 23, 2010 at 1:14 PM, Tom Nichols <[email protected]> wrote: > Really? It looked to me like Hive was complaining about the comma at > the end of "from report_history rh," > > SELECT rh.user_name, rh.report_name, rh.report_run_date > FROM report_history rh, > (SELECT max(report_run_date) as maxdate, report_name > FROM report_history > GROUP BY report_name) maxresults > WHERE rh.report_name = maxresults.report_name > AND rh.report_run_date= maxresults.maxdate; > > FAILED: Parse Error: line 2:20 mismatched input ',' expecting EOF > > I was able to perform the same task by using an intermediate table for > the inner query but that obviously complicates things and I'd suspect > makes it take longer. > > Thanks again for the help! > -Tom > > > On Tue, Feb 23, 2010 at 4:00 PM, Carl Steinbach <[email protected]> wrote: >> Tom, >> >> The example in the link will compile and execute on Hive without >> modification. >> >> Carl >> >> On Tue, Feb 23, 2010 at 12:46 PM, Tom Nichols <[email protected]> wrote: >>> >>> Hi Hive users, >>> >>> I've got a somewhat convoluted query which I'm wondering how I would >>> translate it to hive... It is similar to the first FAQ example here: >>> http://www.techonthenet.com/sql/max.php >>> So following the example above, I could easily get the last run date >>> for each report, but how would I get the _user_ who last ran the >>> report? It involves a join based on a subquery. Conceptually this >>> seems like it wouldn't be difficult because I can carry all of the >>> columns while searching for a 'max' of one column in a M-R job... it >>> just doesn't express itself in SQL quite the same way. >>> >>> Is there a way to put the result of my inner select directly into >>> another Hive table and then my outer select would just join on that >>> table? >>> >>> Thanks. >> >> > -- Yours, Zheng
