RE: How HIVE manages a join

Joydeep Sen Sarma Thu, 12 Aug 2010 00:19:39 -0700

i hate this message: 'THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join 
Syntax'


why must edits to the wiki be banned if there are xdocs? hadoop has both.

there will always be things that are not captured in xdocs. it's pretty sad to 
discourage free form edits by people who want to contribute without checking 
out source. (what is this - the 80s?)
________________________________________
From: Edward Capriolo [[email protected]]
Sent: Tuesday, August 10, 2010 2:57 PM
To: [email protected]
Cc: [email protected]
Subject: Re: How HIVE manages a join

Sorry.
$hive_root/docs/xdocs/language_manual/joins.xml

On Tue, Aug 10, 2010 at 5:57 PM, Edward Capriolo <[email protected]> wrote:
> This page is is already in version control..
>
> /home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml
>
> Edward
>
> On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach <[email protected]> wrote:
>> Hi Yongqiang,
>> Please go ahead and update the wiki page. I will copy it over to version
>> control when you are done.
>> Thanks.
>> Carl
>>
>> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he <[email protected]>
>> wrote:
>>>
>>> In the Hive Join wiki page, it says
>>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>>>
>>> Where should i do the update?
>>>
>>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <[email protected]>
>>> wrote:
>>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
>>> > and seems stable now. I did one skew join but haven't get a chance to
>>> > look at another skew join Namit mentioned to me. But definitely should
>>> > update the wiki earlier. My bad.
>>> >
>>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <[email protected]>
>>> > wrote:
>>> >> Yongqiang mentioned he was going to update the wiki with this
>>> >> information in
>>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>>> >>
>>> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
>>> >> map
>>> >> join and the other skew join you mention in the above thread?
>>> >>
>>> >> Thanks,
>>> >> Jeff
>>> >>
>>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>>> >> <[email protected]> wrote:
>>> >>>
>>> >>> Roberto ..
>>> >>>
>>> >>> You can find these links useful ..
>>> >>>
>>> >>>
>>> >>>
>>> >>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>>> >>> - Simple joins and optimizations..
>>> >>>
>>> >>>
>>> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  
>>> >>> -
>>> >>> New kind of joins / features of hive ..
>>> >>>
>>> >>> Thanks
>>> >>>
>>> >>> Bharath.V
>>> >>> 4th year Undergraduate..
>>> >>> IIIT Hyderabad
>>> >>>
>>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>>> >>> <[email protected]> wrote:
>>> >>>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> I cannot find any documentation about what algorithm performs HIVE to
>>> >>>> translate JOIN clauses to Map-Reduce tasks.
>>> >>>>
>>> >>>> In particular, if I have two tables A and B, each table is written on
>>> >>>> a
>>> >>>> separate file and each file is splitted on hadoop nodes. When I
>>> >>>> perform a
>>> >>>> JOIN with A.column = B.column, the framework has to compare full data
>>> >>>> from
>>> >>>> the first file and full data from the second file. In order to
>>> >>>> perform a
>>> >>>> full scan of all possibile combinations of values, how can hadoop
>>> >>>> perform
>>> >>>> it? If each node contains a portion of each file, it seems not
>>> >>>> possible to
>>> >>>> have a complete comparison. Does one of the two files enterely
>>> >>>> replicated on
>>> >>>> each node? Or, does HIVE use another kind of strategy/optimization?
>>> >>>>
>>> >>>> Thanks.
>>> >>
>>> >>
>>> >
>>
>>
>

RE: How HIVE manages a join

Reply via email to