Yeah. The sort merge bucket mapjoin has been finished for sometime,
and seems stable now. I did one skew join but haven't get a chance to
look at another skew join Namit mentioned to me. But definitely should
update the wiki earlier. My bad.

On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <[email protected]> wrote:
> Yongqiang mentioned he was going to update the wiki with this information in
> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>
> Yongqiang, have you gotten a chance to complete the sort merge bucket map
> join and the other skew join you mention in the above thread?
>
> Thanks,
> Jeff
>
> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
> <[email protected]> wrote:
>>
>> Roberto ..
>>
>> You can find these links useful ..
>>
>>
>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>> - Simple joins and optimizations..
>>
>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-teamĀ  -
>> New kind of joins / features of hive ..
>>
>> Thanks
>>
>> Bharath.V
>> 4th year Undergraduate..
>> IIIT Hyderabad
>>
>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>> <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> I cannot find any documentation about what algorithm performs HIVE to
>>> translate JOIN clauses to Map-Reduce tasks.
>>>
>>> In particular, if I have two tables A and B, each table is written on a
>>> separate file and each file is splitted on hadoop nodes. When I perform a
>>> JOIN with A.column = B.column, the framework has to compare full data from
>>> the first file and full data from the second file. In order to perform a
>>> full scan of all possibile combinations of values, how can hadoop perform
>>> it? If each node contains a portion of each file, it seems not possible to
>>> have a complete comparison. Does one of the two files enterely replicated on
>>> each node? Or, does HIVE use another kind of strategy/optimization?
>>>
>>> Thanks.
>
>

Reply via email to