Re: How HIVE manages a join

Jeff Hammerbacher Fri, 06 Aug 2010 20:33:23 -0700

Yongqiang mentioned he was going to update the wiki with this information in
the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.


Yongqiang, have you gotten a chance to complete the sort merge bucket map
join and the other skew join you mention in the above thread?

Thanks,
Jeff

On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada <
[email protected]> wrote:

> Roberto ..
>
> You can find these links useful ..
>
>
> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551-
>  Simple joins and optimizations..
>
> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
> New kind of joins / features of hive ..
>
> Thanks
>
> Bharath.V
> 4th year Undergraduate..
> IIIT Hyderabad
>
>
> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto <
> [email protected]> wrote:
>
>> Hi,
>>
>> I cannot find any documentation about what algorithm performs HIVE to
>> translate JOIN clauses to Map-Reduce tasks.
>>
>> In particular, if I have two tables A and B, each table is written on a
>> separate file and each file is splitted on hadoop nodes. When I perform a
>> JOIN with A.column = B.column, the framework has to compare full data from
>> the first file and full data from the second file. In order to perform a
>> full scan of all possibile combinations of values, how can hadoop perform
>> it? If each node contains a portion of each file, it seems not possible to
>> have a complete comparison. Does one of the two files enterely replicated on
>> each node? Or, does HIVE use another kind of strategy/optimization?
>>
>> Thanks.
>
>
>

Re: How HIVE manages a join

Reply via email to