Re: How HIVE manages a join

Raghu Murthy Thu, 12 Aug 2010 14:13:41 -0700

The hive.pdf link in the Design page is this one:
http://www.slideshare.net/namit_jain/hive-demo-paper-at-vldb-2009


A later paper in ICDE'10 is available here:
http://i.stanford.edu/~ragho/hive-icde2010.pdf

Both of these papers and others are linked from:
http://wiki.apache.org/hadoop/Hive/Presentations

Hope this helps.


On Aug 12, 2010, at 2:05 PM, akshaya iyengar wrote:

Hello,
I apologize if this is out of context for the current thread. I was looking for 
the Hive architecture diagram on this page 
http://wiki.apache.org/hadoop/Hive/Design . The pdf link doesnt seem to work 
for me as well.

It would be a great help if someone could direct me to this information.

Thanks,
Akshaya

On Thu, Aug 12, 2010 at 4:37 PM, Edward Capriolo 
<[email protected]<mailto:[email protected]>> wrote:
Joydeep,

I am sorry. I put that when I thought we were going to actively move
to xdocs. You an remove that if you like.

As i said in a thread before the problem with the wiki is that no one
actively updates it. Example:

http://wiki.apache.org/hadoop/Hive/LanguageManual/Select
oopse: Really what about "in support"...
https://issues.apache.org/jira/browse/HIVE-801

Which is why I hold the option that all patches except bug fixes
should probably come with xdocs, People are free to disagree.

Edward

On Thu, Aug 12, 2010 at 3:16 AM, Joydeep Sen Sarma 
<[email protected]<mailto:[email protected]>> wrote:
> i hate this message: 'THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join 
> Syntax'
>
> why must edits to the wiki be banned if there are xdocs? hadoop has both.
>
> there will always be things that are not captured in xdocs. it's pretty sad 
> to discourage free form edits by people who want to contribute without 
> checking out source. (what is this - the 80s?)
> ________________________________________
> From: Edward Capriolo [[email protected]<mailto:[email protected]>]
> Sent: Tuesday, August 10, 2010 2:57 PM
> To: [email protected]<mailto:[email protected]>
> Cc: [email protected]<mailto:[email protected]>
> Subject: Re: How HIVE manages a join
>
> Sorry.
> $hive_root/docs/xdocs/language_manual/joins.xml
>
> On Tue, Aug 10, 2010 at 5:57 PM, Edward Capriolo 
> <[email protected]<mailto:[email protected]>> wrote:
>> This page is is already in version control..
>>
>> /home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml
>>
>> Edward
>>
>> On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach 
>> <[email protected]<mailto:[email protected]>> wrote:
>>> Hi Yongqiang,
>>> Please go ahead and update the wiki page. I will copy it over to version
>>> control when you are done.
>>> Thanks.
>>> Carl
>>>
>>> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he 
>>> <[email protected]<mailto:[email protected]>>
>>> wrote:
>>>>
>>>> In the Hive Join wiki page, it says
>>>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>>>>
>>>> Where should i do the update?
>>>>
>>>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he 
>>>> <[email protected]<mailto:[email protected]>>
>>>> wrote:
>>>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
>>>> > and seems stable now. I did one skew join but haven't get a chance to
>>>> > look at another skew join Namit mentioned to me. But definitely should
>>>> > update the wiki earlier. My bad.
>>>> >
>>>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher 
>>>> > <[email protected]<mailto:[email protected]>>
>>>> > wrote:
>>>> >> Yongqiang mentioned he was going to update the wiki with this
>>>> >> information in
>>>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>>>> >>
>>>> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
>>>> >> map
>>>> >> join and the other skew join you mention in the above thread?
>>>> >>
>>>> >> Thanks,
>>>> >> Jeff
>>>> >>
>>>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>>>> >> <[email protected]<mailto:[email protected]>> 
>>>> >> wrote:
>>>> >>>
>>>> >>> Roberto ..
>>>> >>>
>>>> >>> You can find these links useful ..
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>>>> >>> - Simple joins and optimizations..
>>>> >>>
>>>> >>>
>>>> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team 
>>>> >>>  -
>>>> >>> New kind of joins / features of hive ..
>>>> >>>
>>>> >>> Thanks
>>>> >>>
>>>> >>> Bharath.V
>>>> >>> 4th year Undergraduate..
>>>> >>> IIIT Hyderabad
>>>> >>>
>>>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>>>> >>> <[email protected]<mailto:[email protected]>>
>>>> >>>  wrote:
>>>> >>>>
>>>> >>>> Hi,
>>>> >>>>
>>>> >>>> I cannot find any documentation about what algorithm performs HIVE to
>>>> >>>> translate JOIN clauses to Map-Reduce tasks.
>>>> >>>>
>>>> >>>> In particular, if I have two tables A and B, each table is written on
>>>> >>>> a
>>>> >>>> separate file and each file is splitted on hadoop nodes. When I
>>>> >>>> perform a
>>>> >>>> JOIN with A.column = B.column, the framework has to compare full data
>>>> >>>> from
>>>> >>>> the first file and full data from the second file. In order to
>>>> >>>> perform a
>>>> >>>> full scan of all possibile combinations of values, how can hadoop
>>>> >>>> perform
>>>> >>>> it? If each node contains a portion of each file, it seems not
>>>> >>>> possible to
>>>> >>>> have a complete comparison. Does one of the two files enterely
>>>> >>>> replicated on
>>>> >>>> each node? Or, does HIVE use another kind of strategy/optimization?
>>>> >>>>
>>>> >>>> Thanks.
>>>> >>
>>>> >>
>>>> >
>>>
>>>
>>
>

Re: How HIVE manages a join

Reply via email to