Re: question about mapred translation

Shirley Cohen Mon, 22 Dec 2008 11:11:17 -0800

Hi Jeff and Ashish,

Thanks for your response. Basically, what I was curious about is howHive implements group by operations. Does it do so in one or two mapreduce stages? Also, are order by's supported in the current version? Ifnot, when will they be?

I haven't had a chance to play with Hive yet, but I intend to do sopretty soon :))


Shirley

Ashish Thusoo wrote:

Hi Shirley,
I think this query would give you an error currently, for two reasons:1. The select list does not contain the group by column and distincttrack group by user is indeterminate - what value of track do you wantto report here on the group of users?2. We do not have order by yet though you can sort of simulate it witha sort by clause and 1 reducer.Can you explain more in terms of what you want the SQL to achieve?Also the hive wiki contains a number of presentations that describehow SQL gets converted to Map/Reduce plans at a high level. Check themout athttp://wiki.apache.org/hadoop/Hive/PresentationsAshish
------------------------------------------------------------------------
*From:* Jeff Hammerbacher [mailto:[email protected]]
*Sent:* Saturday, December 20, 2008 11:35 AM
*To:* [email protected]
*Subject:* Re: question about mapred translation

Hey Shirley,
Welcome to Hive! Once you've gotten Hive up and running and havecreated the "music" table, you should be able to say "EXPLAIN<query>", where <query> is the query specified below (or any otherquery). For more detailed information, you can say "EXPLAIN EXTENDED<query>".
The output from the EXPLAIN might be a little obtuse, so if you stillhave questions, I'm sure more knowledgeable Hive folks can give youinsight into the planner.
Regards,
Jeff
On Sat, Dec 20, 2008 at 11:48 AM, Shirley Cohen <[email protected]<mailto:[email protected]>> wrote:
    Hi,

    I'm just getting started with Hive. I was wondering if anyone can
    tell me how Hive would translate the following SQL query into
    MapReduce:

    select distinct track, count(track)
    from music
    where date_listened between '12-10-2008' and '12-11-2008'
    group by user
    order by count(track) desc

    How many mapred jobs would it use? How would the map and red
    functions looks like? Also, does Hive have a utility that gives
    you this information?

    Thanks,

    Shirley

Re: question about mapred translation

Reply via email to