RE: question about mapred translation

Ashish Thusoo Mon, 22 Dec 2008 10:59:08 -0800

Hi Shirley,

I think this query would give you an error currently, for two reasons:


1. The select list does not contain the group by column and distinct track 
group by user is indeterminate - what value of track do you want to report here 
on the group of users?
2. We do not have order by yet though you can sort of simulate it with a sort 
by clause and 1 reducer.

Can you explain more in terms of what you want the SQL to achieve?

Also the hive wiki contains a number of presentations that describe how SQL 
gets converted to Map/Reduce plans at a high level. Check them out at

http://wiki.apache.org/hadoop/Hive/Presentations

Ashish

________________________________
From: Jeff Hammerbacher [mailto:[email protected]]
Sent: Saturday, December 20, 2008 11:35 AM
To: [email protected]
Subject: Re: question about mapred translation

Hey Shirley,

Welcome to Hive! Once you've gotten Hive up and running and have created the 
"music" table, you should be able to say "EXPLAIN <query>", where <query> is 
the query specified below (or any other query). For more detailed information, 
you can say "EXPLAIN EXTENDED <query>".

The output from the EXPLAIN might be a little obtuse, so if you still have 
questions, I'm sure more knowledgeable Hive folks can give you insight into the 
planner.

Regards,
Jeff

On Sat, Dec 20, 2008 at 11:48 AM, Shirley Cohen 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I'm just getting started with Hive. I was wondering if anyone can tell me how 
Hive would translate the following SQL query into MapReduce:

select distinct track, count(track)
from music
where date_listened between '12-10-2008' and '12-11-2008'
group by user
order by count(track) desc

How many mapred jobs would it use? How would the map and red functions looks 
like? Also, does Hive have a utility that gives you this information?

Thanks,

Shirley

RE: question about mapred translation

Reply via email to